Google AutoML rocks!

Waaauw Google AutoML Vision rocks!

A few months ago I performed a simple test with Keras to create a Peugeot – BMW image classifier on my laptop.

See Is that a BMW or a Peugeot?

A friendly encouragement from Erwin Huizenga to try Google AutoML Vision resulted in a very good BMW-Peugeot classifier 10 minutes later without a single line of code. Just a few simple steps were needed.

  • Upload your car images and label them.


  • Click train (first hour of training is for free!)
  • After 10 minutes the model was trained, with very good precision and recall.


And the nice thing: The model is up-and-running for anyone or any client application who needs an image prediction.


Just give it a try……Google AutoML Vision!


Little image experiment on my son

Please don’t report me to the authorities for conducting a little frivolous experiment on my son. 

There is this nice python package ‘face_recognition’, it can recognize, manipulate faces in pictures and it can calculate distances between faces.  What is the distance between my son and me (when I was 9 years old) and how does that compare with other children.

The picture below shows two soccer youth teams, my son and me (35 years ago). I have Chinese roots, so to make it hard for the algorithm I included a Chinese soccer team. I am happy with the results of the experiment. With a distance of 0.601, my son almost had the smallest distance to me of all the 28 faces…..

Try out the face recognition python package for yourself.  


Cucumber time, food on a 2D plate / plane



It is 35 degree Celsius out side, we are in the middle of the ‘slow news season’, in many countries also called cucumber time.  A period typified by the appearance of less informative and frivolous news in the media.

Did you know that 100 g of cucumber contain 0.28 mg of iron and 1.67 g of sugar? You can find all the nutrient values of a cucumber on the USDA food databases.

Food Data

There is more data, for many thousands of products you can retrieve nutrient values through an API (need to register for a free KEY). So besides the cucumber I extracted data for different type of food for example

  • Beef products
  • Dairy & Egg products
  • Vegetables
  • Fruits
  • etc.

And as a comparison, I retrieved the nutrient values for some fast food products from McDonald’s and Pizza Hut. Just to see if pizza can be classified  as vegetable from a data point of view 🙂 So the data looks like:


I have sampled 1500 products and per product we have 34 nutrient values.


The 34 dimensional data is now compressed / projected onto a two dimensional plane using UMAP (Uniform Manifold Approximation and Projection). There is a Python and R package to this.


An interactive map can be found here, and the R code to retrieve and plot the data here. Cheers, Longhow.


Dutch data science poetry


Sorry, hebben jullie heel even? Een dag uit een data saai-entist leven.

Dit is dan weer zo’n dag, waar helemaal niets meer mag. Wacht al een uur op mijn query, crasht ie!!! en doet ie het weer nie.

Ik train een mooie decision tree, Maar mijn model redt het nie. Doe dan maar mijn neuraal netwerk, Hmm, voorspellend vermogen, ook niet sterk.

Dan maar een rondje langs de business, ze snappen me niet, er is iets mis! Het regent buiten echt pijpenstelen, tijd om mijn werk maar op Git te delen.

Recruiters staan op je voice mail te gillen, Thuis ben je gewoon de aardappels aan het schillen. Nou ja, Tijd om naar bed te gaan, morgen maar weer verder met mijn sexy baan!


Echt HEMA….

Er zijn zoveel interessante technieken in het data science vakgebied, het is moeilijk om dat allemaal bij te houden. Werd laatst getriggered door UMAP (Uniform Manifold Approximation and Projection) voor dimensie reductie. 

Als ik het artikel ( doorlees krijg ik nostalgische herinneringen aan mijn Analyse I, II en III vakken aan de VU over o.a. Riemann manifolds!

Een friovole/ludieke vingeroefening met HEMA data om eens te spelen met UMAP. De stappen zijn simpel:

  1. Scrape wat Hema product plaatjes van de site.
  2. Haal deze plaatjes door een pre-trained deeplearning netwerk waar de top layer vanaf is gehaald.
  3. Elk plaatje is nu omgezet in een hoog dimensionele feature vector.
  4. Pas UMAP toe om de dimensie naar drie te reduceren.
  5. Zet in een interactieve Shiny app….. see here


Amsterdam in an R leaflet nutshell


The municipal services of Amsterdam (The Netherlands) is providing open panorama images. See here and here. A camera car has driven around in the city, and now you can download these images.

Per neighborhood of Amsterdam  I randomly sampled 20 images and put them in an animated gif using R magick and the put it on a interactive leaflet map.

Before you book your tickets to Amsterdam, have a quick look here on the leaflet first 🙂


Is that a BMW or a Peugeot?



My son is 8 years old and he has shown a lot of interest in cars, which is strange because I have zero interest in cars. But he is driving me crazy when we have a car ride: “dad is that an Peugeot?“, “dad, that is an Audi” and “that is a BMW, right?“, “That is another cool BMW, why don’t we have a BMW?“. He is pretty accurate, close to 100%! I was curious how accurate a very simple model could get. Just a re-use of a pre-trained image model approach on my laptop without any GPU’s.

Image Data

There is a nice python package google-images-download to help you download certain images.

from google_images_download import google_images_download 
response = google_images_download.googleimagesdownload()  

arguments = {
  "keywords": "BMW,PEUGEOT",
  "print_urls": False,
  "suffix_keywords": "car",
  "output_directory": "TMP",
   "format": "png"

The above code will get you images of BMW’s and Peugeots, the problem though is that not all images are actually cars. You’ll see scooters, navigation systems and garages. Moreover, some downloaded files do not open at all.


So first, we can use a pre-trained resnet50 or vgg16 image classifier and run the downloaded files through this classifier and keep only the images that keras can open and were classified as car or wagon. Then the images are organized in the following folder structure

├── training
│   ├── bmw (150 images)
│   └── peugeot(150 images)
└── validation
    ├── bmw (50 images)
    └── peugeot(50 images)

Predictive model

I am using the most simple approach, both in terms of modeling and computational effort. It is described in section 5.3 of this fantastic book “Deep Learning in R” by François Chollet and J. J. Allaire.

  • Take a pretrained network, say VGG16, remove the top so that you only have a convolutional base.
  • Now run your images trough this base so that each image is a tensor.
  • Treat these tensors as input for a complete separate neural network classifier. For example a simple one hidden fully connected layer with 256 neurons, shown in the code snippet below.
model <- keras_model_sequential() %>%
    units = 256, 
    activation = "relu", 
    input_shape = 4 * 4 * 512
  ) %>%
  layer_dropout(rate = 0.5) %>%
  layer_dense(units = 1, activation = "sigmoid")

The nice thing is that once you have put your images in the proper folder structure you can just ‘shamelessly’ copy/paste the code from the accompanying markdown of the book and start training a BMW-Peugeot model.



After 15 epochs or so the accuracy on the validation images flattens of to around 80% which is not super good and not even close to what my son can achieve 🙂 But it is not too bad either for just 30 minutes of work in R, mostly copy pasting code….. Cheers, Longhow.