A “poor man’s video analyzer”…

video01

Introduction

Not so long ago there was a nice dataiku meetup with Pierre Gutierrez talking about transfer learning. RStudio recently released the keras package, an R interface to work with keras for deep learning and transfer learning. Both events inspired me to do some experiments at my work here at RTL and explore the usability of it for us at RTL. I like to share the slides of the presentation that I gave internally at RTL, you can find them on slide-share.

As a side effect, another experiment that I like to share is the “poor man’s video analyzer“. There are several vendors now that offer API’s to analyze videos. See for example the one that Microsoft offers. With just a few lines of R code I came up with a shiny app that is a very cheap imitation ­čÖé

Set up of the R Shiny app

To run the shiny app a few things are needed. Make sure that ffmpeg is installed, it is used to extract images from a video. Tensorflow and keras need to be installed as well. The extracted images from the video are parsed through a pre-trained VGG16 network so that each image is tagged.

After this tagging a data table will appear with the images and their tags. That’s it! I am sure there are better visualizations than a data table to show a lot of images. If you have a better idea just adjust my shiny app on GitHub…. ­čÖé

Using the app, some screen shots

There is a simple interface, specify the number of frames per second that you want to analyse. And then upload a video, many formats are supported (by ffmpeg), like *.mp4, *.mpeg, *.mov

Click on video images to start the analysis process. This can take a few minutes, when it is finished you will see a data table with extracted images and their tags from VGG-16.

Click on ‘info on extracted classes’ to see an overview of the class. You will see a bar chart of tags that where found and the output of ffmpeg. It shows some info on the video.

If you have code to improve the data table output in a more fancy visualization, just go to my GitHub. For those who want to play around, look at a live video analyzer shiny app here.

A Shiny app version using miniUI will be a better fit for small mobile screens.

Cheers, Longhow

Advertisements

The Eurovision 2016 song contest in an R Shiny app

Introduction

In just a few weeks the Eurovision 2016 song contest┬áwill be held again. There are 43 participants, two semi-finals on the 10th and 12th of May and a final on the 14th of May. It’s going to be a long watch in front of the television…. ­čÖé Who is going to win? Well, you could ask experts, lookup the number of tweets on the different participants, count YouTube likes or go to bookmakers sites. On the time of writing Russia was the favorite among the bookmakers according to this overview of bookmakers.

Spotify data

As an alternative, I used Spotify data. There is a Spotify API which allows you to get information on Play lists, Artists, Tracks, etc. It is not difficult to extract interesting information from the API:

  • Sign up for a (Premium or Free) Spotify account
  • Register a new application on the ‘My Applications‘ site
  • You will then get a client ID and a client Secret

In R you can use the httr library to make API calls. First, with the client ID and secret you need to retrieve a token, then with the token you can call one of the Spotify API endpoints, for example information on a specific artist, see the R code snippet below.


library(httr)

clientID = '12345678910'
secret = 'ABCDEFGHIJKLMNOPQR'

response = POST(
'https://accounts.spotify.com/api/token',
accept_json(),
authenticate(clientID, secret),
body = list(grant_type = 'client_credentials'),
encode = 'form',
verbose()
)

mytoken = content(response)$access_token

## Frank Sinatra spotify artist ID
artistID = '1Mxqyy3pSjf8kZZL4QVxS0'

HeaderValue = paste0('Bearer ', mytoken)

URI = paste0('https://api.spotify.com/v1/artists/', artistID)
response2 = GET(url = URI, add_headers(Authorization = HeaderValue))
Artist = content(response2)

The content of the second response object is a nested list with information on the artist. For example url links to images, the number of followers, the popularity of an artist, etc.

Track popularity

An interesting API endpoint is the track API. Especially the information on the track popularity. What is the track popularity? Taken from the Spotify web site:

The popularity of the track. The value will be between 0 and 100, with 100 being the most popular. The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.

I wrote a small R script to retrieve the track popularity every hour of each of the 43 tracks that participate in this years Eurovision song contest. The picture below lists the top 10 popular tracks of the song contest participants.

worpressplaatjescore

At the time of writing the the most popular track was “If I Were Sorry”┬áby┬áFrans (Sweden), which is placed on number three by the bookmakers.The least popular track was “The real Thing” by Highway (Montenegro), corresponding to the last place of the bookmakers.

There is not a lot of movement in the track popularity, it is very stable over time. Maybe when we get nearer to the song contest final in May we’ll see some more movement. I have also kept track of the number of followers that an artist has.There is much more movement here. See the figure below.

aantalfollwers

Everyday around 5 pm – 6 pm Frans gets around 10 to 12 new followers on Spotify! Artist may of course also lose some followers, for example Douwe Bob in the above picture.

Audio features and related artists

Audio features of tracks like loudness, dance-ablity, tempo etc, can also be retrieved from the API. A simple scatter plot of the 43 songs reveals loud and undancable songs. For example, Francesca Michielin (Italy), she is one of the six lucky artists that already has a place in the final!

audiofeatures

Every artist on Spotify also has a set of related artist, this set can be retrieved from the API and can be viewed nicely in a network graph.

artistnetwerk

The green nodes are the 43 song contest participants. Many of them are ‘isolated’ but some of them are related to each other or connected through a common related artist.

Conclusion

I have created a small Eurovision 2016 Shiny app that summarizes the above information so you can see and listen for your self. We will find out how strong the Spotify track popularity is correlated with the final ranking of the Eurovision song contest on May the 14th!

Cheers, Longhow.

Delays on the Dutch railway system

I almost never travel by train, the last time was years ago. However, recently I had to take the train from Amsterdam and it was delayed for 5 minutes. No big deal, but I was just curious how often these delays occur on the Dutch railway system. I couldn’t quickly find a historical data set with information on delays, so I decided to gather my own data.

The Dutch Railways provide an API┬á(De NS API) that returns actual departure and delay data for a certain train station. I have written a small R script that calls this API for each of the 400 train stations in The Netherlands. ┬áThis script is then scheduled to run every 10 minutes. ┬áThe API returns data in XML format, the basic entity is “a departing train”. For each departing train we know its departure time, the destination, the departing train station, the type of train, the delay (if there is any), etc. So what to do with all these departing trains? Throw it all into┬áMongoDB. Why?

  • Not for any particular reason :-).
  • It’s easy to install and setup on my little Ubuntu server.
  • There is a nice R interface to MongoDB.
  • The response structure (see picture below) from the API is not that difficult to flatten to a table, but NoSQL sounds more sexy than MySQL nowadays ­čÖé

mongoentry

I started to collect train departure data at the 4th of January, per day there are around 48.000 train departures in The Netherlands. I can see how much of them are delayed, per day, per station or per hour. Of course, since the collection started only a few days ago its hard to use these data for long-term delay rates of the Dutch railway system. But it is a start.

To present this delay information in an interactive way to others I have created an R Shiny app that queries the MongoDB database. The picture below from my Shiny app shows the delay rates per train station on the 4th of January 2016, an icy day especially in the north of the Netherlands.

kaartje

Cheers,

Longhow

Analyzing “Twitter faces” in R with Microsoft Project Oxford

Introduction

In my previous blog post I used the Microsoft Translator API in my BonAppetit Shiny app to recommend restaurants to tourists. I’m getting a little bit addicted to the Microsoft API’s, they can be fun to use :-). In this blog post I will briefly describe some of the Project Oxford API’s of Microsoft.

The API’s can be called from within R, and if you combine them with other API’s, for example Twitter, then interesting “Twitter face”┬áanalyses can be done. ┬áSee my “TweetFace” shiny app to analyse faces that can be found on Twitter.

Project Oxford

The API’s of Project Oxford can be categorized into:

  • Computer Vision,
  • Face,
  • Video,
  • Speech and
  • Language.

The free tier subscription provides 5000 API calls per month (with a rate limit of 20 calls per minute). I focused my experiments on the computer vision and face API’s, a lot of functionality is available to analyze images. For example, categorization of images, adult content detection, OCR, face recognition, gender analysis, age estimation and emotion detection.

Calling the API’s from R

The httr package provides very convenient functions to call the Microsoft API’s. You need to sign-up first and obtain a key. Let’s do a simple test on Angelina Jolie by using the face detect API.

angelina

Angelina Jolie, picture link

library(httr)

faceURL = "https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair"
img.url = 'http://www.buro247.com/images/Angelina-Jolie-2.jpg'

faceKEY = '123456789101112131415'

mybody = list(url = img.url)

faceResponse = POST(
  url = faceURL, 
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceResponse
Response [https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair]
Date: 2015-12-16 10:13
Status: 200
Content-Type: application/json; charset=utf-8
Size: 1.27 kB

If the call was successful a “Status: 200” is┬áreturned and the response object is filled with interesting information. The API returns the information as JSON which is parsed by R into nested lists.


AngelinaFace = content(faceResponse)[[1]]
names(AngelinaFace)
[1] "faceId"  "faceRectangle" "faceLandmarks" "faceAttributes"

AngelinaFace$faceAttributes
$gender
[1] "female"

$age
[1] 32.6

$facialHair
$facialHair$moustache
[1] 0

$facialHair$beard
[1] 0

$facialHair$sideburns
[1] 0

Well, the API recognized the gender and that there is no facial hair :-), but her age is under estimated, Angelina is 40 not 32.6! Let’s look at emotions, the emotion API has its own key and url.


URL.emoface = 'https://api.projectoxford.ai/emotion/v1.0/recognize'

emotionKey = 'ABCDEF123456789101112131415'

mybody = list(url = img.url)

faceEMO = POST(
  url = URL.emoface,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = emotionKEY)),
  body = mybody,
  encode = 'json'
)
faceEMO
AngelinaEmotions = content(faceEMO)[[1]]
AngelinaEmotions$scores
$anger
[1] 4.573111e-05

$contempt
[1] 0.001244121

$disgust
[1] 0.0001096572

$fear
[1] 1.256477e-06

$happiness
[1] 0.0004313129

$neutral
[1] 0.9977798

$sadness
[1] 0.0003823086

$surprise
[1] 5.75276e-06

A fairly neutral face. Let’s test some other Angelina faces

angelina2

Find similar faces

A nice piece of functionality of the API is finding similar faces. First a list of faces needs to be created, then with a ‘query face’ you can search for similar-looking faces in the list of faces. Let’s look at the most sexy actresses.


## Scrape the image URLs of the actresses
library(rvest)

linksactresses = 'http://www.imdb.com/list/ls050128191/'

out = read_html(linksactresses)
images = html_nodes(out, '.zero-z-index')
imglinks = html_nodes(out, xpath = "//img[@class='zero-z-index']/@src") %>% html_text()

## additional information, the name of the actress
imgalts = html_nodes(out, xpath = "//img[@class='zero-z-index']/@alt") %>% html_text()

Create an empty list, by calling the facelist API, you should spcify a facelistID, which is placed as request parameter behind the facelist URL. So my facelistID is “listofsexyactresses” as shown in the code below.

### create an id and name for the face list
URL.face = "https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses"

mybody = list(name = 'top 100 of sexy actresses')

faceLIST = PUT(
  url = URL.face,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceLIST
Response [https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses]
Date: 2015-12-17 15:10
Status: 200
Content-Type: application/json; charset=utf-8
Size: 108 B

Now fill the list with images, the API allows you to provide user data with each image, this can be handy to insert names or other info. So for one image this works as follows

i=1
userdata = imgalts[i]
linkie = imglinks[i]
face.uri = paste(
  'https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses/persistedFaces?userData=',
  userdata,
  sep = ";"
)
face.uri = URLencode(face.uri)
mybody = list(url = linkie )

faceLISTadd = POST(
  url = face.uri,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceLISTadd
print(content(faceLISTadd))
Response [https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses/persistedFaces?userData=Image%20of%20Naomi%20Watts]
Date: 2015-12-17 15:58
Status: 200
Content-Type: application/json; charset=utf-8
Size: 58 B

$persistedFaceId
[1] '32fa4d1c-da68-45fd-9818-19a10beea1c2'

## status 200 is OK

Just loop over the 100 faces to complete the face list. With the list of images we can now perform a query with a new ‘query face’. Two steps are needed, first call the face detect API to obtain a┬áface ID. I am going to use the image of Angelina, but a different one than the image on IMDB.


faceDetectURL = 'https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair'
img.url = 'http://a.dilcdn.com/bl/wp-content/uploads/sites/8/2009/06/angelinaangry002.jpg'

mybody = list(url = img.url)

faceRESO = POST(
  url = faceDetectURL,
  content_type('application/json'), add_headers(.headers =  c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceRESO
fID = content(faceRESO)[[1]]$faceId

With the face ID, query the face list with the “find similar” API. There is a confidence of almost 60%.


sim.URI = 'https://api.projectoxford.ai/face/v1.0/findsimilars'

mybody = list(faceID = fID, faceListID = 'listofsexyactresses' )

faceSIM = POST(
  url = sim.URI,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceSIM
yy = content(faceSIM)
yy
[[1]]
[[1]]$persistedFaceId
[1] "6b4ff942-b216-4817-9739-3653a467a594"

[[1]]$confidence
[1] 0.5980769

The picture below shows some other matches…..

matches

Conclusion

The API’s of Microsoft’s Project Oxford provide nice functionality for computer vision, face analysis. It’s fun to use them, see my ‘TweetFace’ Shiny app to analyse images on Twitter.

Cheers,

Longhow