A “poor man’s video analyzer”…

video01

Introduction

Not so long ago there was a nice dataiku meetup with Pierre Gutierrez talking about transfer learning. RStudio recently released the keras package, an R interface to work with keras for deep learning and transfer learning. Both events inspired me to do some experiments at my work here at RTL and explore the usability of it for us at RTL. I like to share the slides of the presentation that I gave internally at RTL, you can find them on slide-share.

As a side effect, another experiment that I like to share is the “poor man’s video analyzer“. There are several vendors now that offer API’s to analyze videos. See for example the one that Microsoft offers. With just a few lines of R code I came up with a shiny app that is a very cheap imitation 🙂

Set up of the R Shiny app

To run the shiny app a few things are needed. Make sure that ffmpeg is installed, it is used to extract images from a video. Tensorflow and keras need to be installed as well. The extracted images from the video are parsed through a pre-trained VGG16 network so that each image is tagged.

After this tagging a data table will appear with the images and their tags. That’s it! I am sure there are better visualizations than a data table to show a lot of images. If you have a better idea just adjust my shiny app on GitHub…. 🙂

Using the app, some screen shots

There is a simple interface, specify the number of frames per second that you want to analyse. And then upload a video, many formats are supported (by ffmpeg), like *.mp4, *.mpeg, *.mov

Click on video images to start the analysis process. This can take a few minutes, when it is finished you will see a data table with extracted images and their tags from VGG-16.

Click on ‘info on extracted classes’ to see an overview of the class. You will see a bar chart of tags that where found and the output of ffmpeg. It shows some info on the video.

If you have code to improve the data table output in a more fancy visualization, just go to my GitHub. For those who want to play around, look at a live video analyzer shiny app here.

A Shiny app version using miniUI will be a better fit for small mobile screens.

Cheers, Longhow

Advertisements

New chapters for 50 shades of grey….

WP

Introduction

Some time ago I had the honor to follow an interesting talk from Tijmen Blankevoort on neural networks and deeplearning. Convolutional and recurrent neural networks were topics that already caught my interest and this talk inspired me to dive into these topics deeper and do some more experiments with it.

In the same session organized by Martin de Lusenet for Ziggo (a Dutch cable company) I also had the honor to give a talk, my presentation contained a text mining experiment that I did earlier on the Dutch TV soap GTST “Goede Tijden Slechte Tijden”. A nice idea by Tijmen was: Why not use deep learning to generate new episode plots for GTST?

So I did that, see my LinkedIn post on GTST. However, these episodes are in Dutch and I guess only interesting for people here in the Netherlands. So to make things more international and more spicier I generated some new texts based on deep learning and the erotic romance novel 50 shades of grey 🙂

More than plain vanilla networks

In R or SAS you could already train plain vanilla neural networks for a long time. The so-called fully connected networks where all input nodes are connected to all nodes in the following hidden layer.And all nodes in a hidden layer are connected to all nodes in the following hidden layer or output layer.

neural

In more recent years deep learning frame works have become very popular. For example Caffe, Torch, CTNK, Tensorflow and MXNET. The additional value of these frame works compared to SAS for example are:

  • They support more network types than plain vanilla networks. For example, convolutional networks, where not all input nodes are connected to a next layer. And recurrent networks, where loops are present. A nice introduction to these networks can be found here and here.
  • They support computations on GPU’s, which could speed up things dramatically.
  • They are open-source and free. No need for long sales and implementation cycles 🙂 Just download it and use it!
RNN

recurrent neural network

My 50 Shades of Grey experiment

For my experiment I used the text of the erotic romance novel 50 shades of grey. A pdf can be found here, I used xpdfbin to extract all the words into a plain text file. I trained a Long Short Term Memory network (LSTM, a special type of recurrent networks), with MXNET. The reason to use MXNET is that they have a nice R interface, so that I can just stay in my comfortable RStudio environment.

Moreover, the R example script of MXNET is ready to run, I just changed the input data and used more rounds of training and more hidden layers. The script and the data can be found on Github.

The LSTM model is fit on character level, the complete romance novel contains 817,204 characters, all these characters are mapped to a number (91 unique numbers). The first few numbers are shown in the following figure.

numbers

Once the model has been trained it can generate new text, character by character!

arsess whatever
yuu’re still expeliar a sally. Reftion while break in a limot.”
“Yes, ald what’s at my artmer and brow maned, but I’m so then for a
dinches suppretion. If you think vining. “Anastasia, and depregineon posing rave.
He’d deharing minuld, him drits.

“Miss Steele
“Fasting at liptfel, Miss I’ve dacind her leaches reme,” he knimes.
“I want to blight on to the wriptions of my great. I find sU she asks the stroke, to read with what’s old both – in our fills into his ear, surge • whirl happy, this is subconisue. Mrs. I can say about the battractive see. I slues
is her ever returns. “Anab.

It’s too even ullnes. “By heaven. Grey
about his voice. “Rest of the meriction.”
He scrompts to the possible. I shuke my too sucking four finishessaures. I need to fush quint the only more eat at me.
“Oh my. Kate. He’s follower socks?
“Lice in Quietly. In so morcieut wait to obsed teach beside my tired steately liked trying that.”
Kate for new of its street of confcinged. I haven’t Can regree.
“Where.” I fluscs up hwindwer-and I have

I’ll staring for conisure, pain!”
I know he’s just doesn’t walk to my backeting on Kate has hotelby of confidered Christaal side, supproately. Elliot, but it’s the ESca, that feel posing, it make my just drinking my eyes bigror on my head. S I’ll tratter topality butterch,” I mud
a nevignes, bleamn. “It’s not by there soup. He’s washing, and I arms and have. I wave to make my eyes. It’s forgately? Dash I’d desire to come your drink my heathman legt
you hay D1 Eyep, Christian Gry, husder with a truite sippking, I coold behind, it didn’t want to mive not to my stop?”
“Yes.”

“Sire, stcaring it was do and he licks his viice ever.”
I murmurs, most stare thut’s the then staraline for neced outsive. She
so know what differ at,” he murmurs?
“I shake my headanold.” Jeez.
“Are you?” Eviulder keep “Oh,_ I frosing gylaced in – angred. I am most drink to start and try aparts through. I really thrial you, dly woff you stund, there, I care an right dains to rainer.” He likes his eye finally finally my eyes to over opper heaven, places my trars his Necked her jups.
“Do you think your or Christian find at me, is so with that stand at my mouth sait the laxes any litee, this is a memory rude. It
flush,” He says usteer?” “Are so that front up.
I preparraps. I don’t scomine Kneat for from Christian.
“Christian,’! he leads the acnook. I can’t see. I breathing Kate’ve bill more over keen by. He releases?”
“I’m kisses take other in to peekies my tipgents my

Conclusion

The generated text does not make any sense, nor will it win any literature prize soon 🙂 Keep in mind, that the model is based ‘only’ on 817,204 characters  (which is considered a small number), and I did not bother to fine-tune the model at all. But still it is funny and remarkable to see that when you use it to generate text, character by character, it can still produce a lot of correct English words and even some correct basic grammar patterns!

cheers, Longhow.

 

Some simple facial analytics on actors (and my manager)

Some time ago I was at a party, inevitably, a question that came up was: “Longhow what kind of work are you doing?” I answered: I am a data scientist I have the most sexy job, do you want me to show you how to use deep learning for facial analytics…… Oops, it became very quiet. Warning: don’t google on the words sexy deep and facial at your work!

But anyway, I was triggered by the Face++ website and thought can we do some simple facial analytics? The Face++ site offers an API where you can upload images (or use links to images). For each image that you upload the Face++ engine can detect if there are one or more faces. If a face is detected, you can also let the engine return 83 facial landmarks. Example landmarks are nose_tip, left_eye_left corner, contour_chin, see the picture below. So an image is now reduced to 83 landmarks.

face_WP

What are some nice faces to analyse? Well, I just googled on “top 200 actors” and got a nice list of actor names, and with the permission of my manager Rein Mertens, I added his picture as well. So what steps did I do next:

Data preparation

1. I have downloaded import.io, this a very handy tool to turn web pages into data. In the tool enter the link to the list of 200 actors. It will then extract all the links to the 200 pictures (together with more data) and conveniently export it to a csv file.

import.io from web page to data

Click to enlarge

2. I wrote a small script in R to import the csv file from step 1, so for each of the 200 actors I have the link to his image. Now I can call the face++ api for every link and let Face++ return the 83 facial landmarks. The result is a data set with 200 rows and 166 columns (x and y position of a landmark) and a ID column containing the name of the actor.

3. I added the 83 land marks of my manager to the data set. So my set now looks like

Analytical base table of actor faces

Click to enlarge

There were some images where Face++ could not detect a face, I removed those images.

nofaces

faces not recognized by the Face++ engine

Now that I have the so-called facial analytical base table of actors I can apply some nice analytics

Deep learning, autoencoders

What are some strange faces? To answer that with an objective point of view, I used deep learning autoencoders. The idea of an autoencoder is to learn a compressed representation for a set of data typically for the purpose of dimensionality reduction (Wikipedia). The trick is to use a neural network where the output target is the same as the input variables. I have actor faces which are observations in a 166 dimensional space, using proc neural in SAS I have reduced the faces to points in a two dimensional space. I have used 5 hidden layers where the middle layer consist of two neurons, as shown in the sas code snippet below. More information on auto encoders in SAS can be found here.

procneural

The faces can now be plotted in a scatter plot, (as shown below) the two dimensions corresponding to the middle layer.

scatter plot of actor faces

click to enlarge

A more interactive version of this plot can be found here, it is a simple D3.js script where you can hover over the dots and click to see the underlying picture.

Hierarchical Cluster analysis

Can we group together faces that are similar (similar facial landmarks)? Yes, we can apply a cluster technique to do that. There are many different techniques, I have used an agglomerative hierarchical cluster technique because the algorithm that is used to cluster the faces can be nicely visualized in a so called dendrogram. The algorithm starts with each observation (face) as its own cluster, in each following iteration pairs of clusters are merged until all observations form one cluster. In SAS you can use proc cluster to perform hierarchical clustering, see the code and dendrogram below.

proccluster

dendrogram faces

Click to enlarge

Using D3.js I have also created a more interactive dendrogram.

— L. —

.