A pretty useless raspberry Pi application

What if you are not good in remembering faces? Well Buy a Rpi, camera and LED matrix display. Install openCV and the face-recognition library, and set it up with important faces to be recognized.

Point your camera to people and if there is a hit the LED display will show the name.

Advertisements

SatRday talks recordings

satRdayLogo

A couple of weeks ago, the first of September we had satRday in Amsterdam (The Netherlands) a fantastic event hosted by GoDataDriven. Now the great talks, including my 10 minute lightning talk on text2vec are online.

My talk

The satRday channel

Cheers, Longhow

 

 

 

Deploy machine learning models with GKE and Dataiku

dataikukubernetes01

Introduction

In a previous post I described how easy it is to create and deploy machine learning models (exposing them as REST APIs) with Dataiku. In particular, it was an XGboost model predicting home values. Now, suppose my model for predicting home values becomes so successful that I need to serve millions of request per hour, then it would be very handy if my back end scales easily.

In this brief post I outline the few steps you need to take to deploy machine learning models created with Dataiku on a scalable kubernetes cluster on Google Kubernetes Engine (GKE).

Create a Kubernetes cluster

There is a nice GKE quickstart that demonstrate the creation of a kubernetes cluster on Google Cloud Platform (GCP). The cluster can be created by using the GUI on the Google cloud console. Alternatively, if you are making use of the Google cloud SDK, it basically boils down to creating and getting credentials with two commands:

gcloud container clusters create myfirst-cluster
gcloud container clusters get-credentials myfirst-cluster

When creating a cluster, there are many options that you can set. I left all options at their default value. It means that only a small cluster of 3 nodes of machine type n1-standard-1 will be created. We can now see the cluster in the Google cloud console.

dataikukubernetes03

Setup the Dataiku API Deployer

Now that you have a kubernetes cluster we can easily deploy predictive models with Dataiku. First, you need to create a predictive model. As described in my previous blog, you can do this with the Dataiku software. Then the Dataiku API Deployer, is the component that will take care of the management and actual deployment of your models onto the kubernetes cluster.

The machine where the Dataiku API Deployer is installed must be able to push docker images to your Google cloud environment and must be able to interact with the kubernetes cluster (through the kubectl command).

dataikukubernetes02

Deploy your stuff……

My XGboost model created in Dataiku is now pushed to the Dataiku API Deployer. From the GUI of the API Deployer you are now able to select the XGboost model to deploy it on your kubernetes cluster.

The API Deployer is a management environment to see what models (and model versions) are already deployed, it checks if the models are up and running, it manages your infrastructure (kubernetes clusters or normal machines).

dataikukubernetes04

When you select a model that you wish to deploy, you can click deploy and select a cluster. It will take a minute or so to package that model into a Docker image and push it to GKE. You will see a progress window.

dataikukubernetes05

When the process is finished you will see the new service on your Kubernetes Engine on GCP.

dataikukubernetes06

The model is up and running, waiting to be called. You could call it via curl for example:

curl -X POST \
  http://35.204.180.188:12000/public/api/v1/xgboost/houseeprice/predict \
  --data '{ "features" : {
    "HouseType": "Tussenwoning",
      "kamers": 6,
      "Oppervlakte": 134,
      "VON": 0,
      "PC": "16"
  }}'

Conclusion

That’s all it was! You now have a scalable model serving engine. Ready to be easily resized when the millions of requests start to come in….. Besides predictive models you can also deploy/expose any R or Python function via the Dataiku API Deployer. Don’t forget to shut down the cluster to avoid incurring charges to your Google Cloud Platform account.

gcloud container clusters delete myfirst-cluster

Cheers, Longhow.

Google AutoML rocks!

Waaauw Google AutoML Vision rocks!

A few months ago I performed a simple test with Keras to create a Peugeot – BMW image classifier on my laptop.

See Is that a BMW or a Peugeot?

A friendly encouragement from Erwin Huizenga to try Google AutoML Vision resulted in a very good BMW-Peugeot classifier 10 minutes later without a single line of code. Just a few simple steps were needed.

  • Upload your car images and label them.

wp1

  • Click train (first hour of training is for free!)
  • After 10 minutes the model was trained, with very good precision and recall.

wp2

And the nice thing: The model is up-and-running for anyone or any client application who needs an image prediction.

wp3.png

Just give it a try……Google AutoML Vision!

Little image experiment on my son

Please don’t report me to the authorities for conducting a little frivolous experiment on my son. 

There is this nice python package ‘face_recognition’, it can recognize, manipulate faces in pictures and it can calculate distances between faces.  What is the distance between my son and me (when I was 9 years old) and how does that compare with other children.

The picture below shows two soccer youth teams, my son and me (35 years ago). I have Chinese roots, so to make it hard for the algorithm I included a Chinese soccer team. I am happy with the results of the experiment. With a distance of 0.601, my son almost had the smallest distance to me of all the 28 faces…..

Try out the face recognition python package for yourself.  

faces_matches

Cucumber time, food on a 2D plate / plane

komkommer

Introduction

It is 35 degree Celsius out side, we are in the middle of the ‘slow news season’, in many countries also called cucumber time.  A period typified by the appearance of less informative and frivolous news in the media.

Did you know that 100 g of cucumber contain 0.28 mg of iron and 1.67 g of sugar? You can find all the nutrient values of a cucumber on the USDA food databases.

Food Data

There is more data, for many thousands of products you can retrieve nutrient values through an API (need to register for a free KEY). So besides the cucumber I extracted data for different type of food for example

  • Beef products
  • Dairy & Egg products
  • Vegetables
  • Fruits
  • etc.

And as a comparison, I retrieved the nutrient values for some fast food products from McDonald’s and Pizza Hut. Just to see if pizza can be classified  as vegetable from a data point of view 🙂 So the data looks like:

Selection_078

I have sampled 1500 products and per product we have 34 nutrient values.

Results

The 34 dimensional data is now compressed / projected onto a two dimensional plane using UMAP (Uniform Manifold Approximation and Projection). There is a Python and R package to this.

Selection_079

An interactive map can be found here, and the R code to retrieve and plot the data here. Cheers, Longhow.

 

Dutch data science poetry

Selection_047

Sorry, hebben jullie heel even? Een dag uit een data saai-entist leven.

Dit is dan weer zo’n dag, waar helemaal niets meer mag. Wacht al een uur op mijn query, crasht ie!!! en doet ie het weer nie.

Ik train een mooie decision tree, Maar mijn model redt het nie. Doe dan maar mijn neuraal netwerk, Hmm, voorspellend vermogen, ook niet sterk.

Dan maar een rondje langs de business, ze snappen me niet, er is iets mis! Het regent buiten echt pijpenstelen, tijd om mijn werk maar op Git te delen.

Recruiters staan op je voice mail te gillen, Thuis ben je gewoon de aardappels aan het schillen. Nou ja, Tijd om naar bed te gaan, morgen maar weer verder met mijn sexy baan!

LL.