The Eurovision 2016 song contest in an R Shiny app

Introduction

In just a few weeks the Eurovision 2016 song contest¬†will be held again. There are 43 participants, two semi-finals on the 10th and 12th of May and a final on the 14th of May. It’s going to be a long watch in front of the television…. ūüôā Who is going to win? Well, you could ask experts, lookup the number of tweets on the different participants, count YouTube likes or go to bookmakers sites. On the time of writing Russia was the favorite among the bookmakers according to this overview of bookmakers.

Spotify data

As an alternative, I used Spotify data. There is a Spotify API which allows you to get information on Play lists, Artists, Tracks, etc. It is not difficult to extract interesting information from the API:

  • Sign up for a (Premium or Free) Spotify account
  • Register a new application on the ‘My Applications‘ site
  • You will then get a client ID and a client Secret

In R you can use the httr library to make API calls. First, with the client ID and secret you need to retrieve a token, then with the token you can call one of the Spotify API endpoints, for example information on a specific artist, see the R code snippet below.


library(httr)

clientID = '12345678910'
secret = 'ABCDEFGHIJKLMNOPQR'

response = POST(
'https://accounts.spotify.com/api/token',
accept_json(),
authenticate(clientID, secret),
body = list(grant_type = 'client_credentials'),
encode = 'form',
verbose()
)

mytoken = content(response)$access_token

## Frank Sinatra spotify artist ID
artistID = '1Mxqyy3pSjf8kZZL4QVxS0'

HeaderValue = paste0('Bearer ', mytoken)

URI = paste0('https://api.spotify.com/v1/artists/', artistID)
response2 = GET(url = URI, add_headers(Authorization = HeaderValue))
Artist = content(response2)

The content of the second response object is a nested list with information on the artist. For example url links to images, the number of followers, the popularity of an artist, etc.

Track popularity

An interesting API endpoint is the track API. Especially the information on the track popularity. What is the track popularity? Taken from the Spotify web site:

The popularity of the track. The value will be between 0 and 100, with 100 being the most popular. The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.

I wrote a small R script to retrieve the track popularity every hour of each of the 43 tracks that participate in this years Eurovision song contest. The picture below lists the top 10 popular tracks of the song contest participants.

worpressplaatjescore

At the time of writing the the most popular track was “If I Were Sorry”¬†by¬†Frans (Sweden), which is placed on number three by the bookmakers.The least popular track was “The real Thing” by Highway (Montenegro), corresponding to the last place of the bookmakers.

There is not a lot of movement in the track popularity, it is very stable over time. Maybe when we get nearer to the song contest final in May we’ll see some more movement. I have also kept track of the number of followers that an artist has.There is much more movement here. See the figure below.

aantalfollwers

Everyday around 5 pm – 6 pm Frans gets around 10 to 12 new followers on Spotify! Artist may of course also lose some followers, for example Douwe Bob in the above picture.

Audio features and related artists

Audio features of tracks like loudness, dance-ablity, tempo etc, can also be retrieved from the API. A simple scatter plot of the 43 songs reveals loud and undancable songs. For example, Francesca Michielin (Italy), she is one of the six lucky artists that already has a place in the final!

audiofeatures

Every artist on Spotify also has a set of related artist, this set can be retrieved from the API and can be viewed nicely in a network graph.

artistnetwerk

The green nodes are the 43 song contest participants. Many of them are ‘isolated’ but some of them are related to each other or connected through a common related artist.

Conclusion

I have created a small Eurovision 2016 Shiny app that summarizes the above information so you can see and listen for your self. We will find out how strong the Spotify track popularity is correlated with the final ranking of the Eurovision song contest on May the 14th!

Cheers, Longhow.

Delays on the Dutch railway system

I almost never travel by train, the last time was years ago. However, recently I had to take the train from Amsterdam and it was delayed for 5 minutes. No big deal, but I was just curious how often these delays occur on the Dutch railway system. I couldn’t quickly find a historical data set with information on delays, so I decided to gather my own data.

The Dutch Railways provide an API¬†(De NS API) that returns actual departure and delay data for a certain train station. I have written a small R script that calls this API for each of the 400 train stations in The Netherlands. ¬†This script is then scheduled to run every 10 minutes. ¬†The API returns data in XML format, the basic entity is “a departing train”. For each departing train we know its departure time, the destination, the departing train station, the type of train, the delay (if there is any), etc. So what to do with all these departing trains? Throw it all into¬†MongoDB. Why?

  • Not for any particular reason :-).
  • It’s easy to install and setup on my little Ubuntu server.
  • There is a nice R interface to MongoDB.
  • The response structure (see picture below) from the API is not that difficult to flatten to a table, but NoSQL sounds more sexy than MySQL nowadays ūüôā

mongoentry

I started to collect train departure data at the 4th of January, per day there are around 48.000 train departures in The Netherlands. I can see how much of them are delayed, per day, per station or per hour. Of course, since the collection started only a few days ago its hard to use these data for long-term delay rates of the Dutch railway system. But it is a start.

To present this delay information in an interactive way to others I have created an R Shiny app that queries the MongoDB database. The picture below from my Shiny app shows the delay rates per train station on the 4th of January 2016, an icy day especially in the north of the Netherlands.

kaartje

Cheers,

Longhow

Analyzing “Twitter faces” in R with Microsoft Project Oxford

Introduction

In my previous blog post I used the Microsoft Translator API in my BonAppetit Shiny app to recommend restaurants to tourists. I’m getting a little bit addicted to the Microsoft API’s, they can be fun to use :-). In this blog post I will briefly describe some of the Project Oxford API’s of Microsoft.

The API’s can be called from within R, and if you combine them with other API’s, for example Twitter, then interesting “Twitter face”¬†analyses can be done. ¬†See my “TweetFace” shiny app to analyse faces that can be found on Twitter.

Project Oxford

The API’s of Project Oxford can be categorized into:

  • Computer Vision,
  • Face,
  • Video,
  • Speech and
  • Language.

The free tier subscription provides 5000 API calls per month (with a rate limit of 20 calls per minute). I focused my experiments on the computer vision and face API’s, a lot of functionality is available to analyze images. For example, categorization of images, adult content detection, OCR, face recognition, gender analysis, age estimation and emotion detection.

Calling the API’s from R

The httr package provides very convenient functions to call the Microsoft API’s. You need to sign-up first and obtain a key. Let’s do a simple test on Angelina Jolie by using the face detect API.

angelina

Angelina Jolie, picture link

library(httr)

faceURL = "https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair"
img.url = 'http://www.buro247.com/images/Angelina-Jolie-2.jpg'

faceKEY = '123456789101112131415'

mybody = list(url = img.url)

faceResponse = POST(
  url = faceURL, 
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceResponse
Response [https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair]
Date: 2015-12-16 10:13
Status: 200
Content-Type: application/json; charset=utf-8
Size: 1.27 kB

If the call was successful a “Status: 200” is¬†returned and the response object is filled with interesting information. The API returns the information as JSON which is parsed by R into nested lists.


AngelinaFace = content(faceResponse)[[1]]
names(AngelinaFace)
[1] "faceId"  "faceRectangle" "faceLandmarks" "faceAttributes"

AngelinaFace$faceAttributes
$gender
[1] "female"

$age
[1] 32.6

$facialHair
$facialHair$moustache
[1] 0

$facialHair$beard
[1] 0

$facialHair$sideburns
[1] 0

Well, the API recognized the gender and that there is no facial hair :-), but her age is under estimated, Angelina is 40 not 32.6! Let’s look at emotions, the emotion API has its own key and url.


URL.emoface = 'https://api.projectoxford.ai/emotion/v1.0/recognize'

emotionKey = 'ABCDEF123456789101112131415'

mybody = list(url = img.url)

faceEMO = POST(
  url = URL.emoface,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = emotionKEY)),
  body = mybody,
  encode = 'json'
)
faceEMO
AngelinaEmotions = content(faceEMO)[[1]]
AngelinaEmotions$scores
$anger
[1] 4.573111e-05

$contempt
[1] 0.001244121

$disgust
[1] 0.0001096572

$fear
[1] 1.256477e-06

$happiness
[1] 0.0004313129

$neutral
[1] 0.9977798

$sadness
[1] 0.0003823086

$surprise
[1] 5.75276e-06

A fairly neutral face. Let’s test some other Angelina faces

angelina2

Find similar faces

A nice piece of functionality of the API is finding similar faces. First a list of faces needs to be created, then with a ‘query face’ you can search for similar-looking faces in the list of faces. Let’s look at the most sexy actresses.


## Scrape the image URLs of the actresses
library(rvest)

linksactresses = 'http://www.imdb.com/list/ls050128191/'

out = read_html(linksactresses)
images = html_nodes(out, '.zero-z-index')
imglinks = html_nodes(out, xpath = "//img[@class='zero-z-index']/@src") %>% html_text()

## additional information, the name of the actress
imgalts = html_nodes(out, xpath = "//img[@class='zero-z-index']/@alt") %>% html_text()

Create an empty list, by calling the facelist API, you should spcify a facelistID, which is placed as request parameter behind the facelist URL. So my facelistID is “listofsexyactresses” as shown in the code below.

### create an id and name for the face list
URL.face = "https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses"

mybody = list(name = 'top 100 of sexy actresses')

faceLIST = PUT(
  url = URL.face,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceLIST
Response [https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses]
Date: 2015-12-17 15:10
Status: 200
Content-Type: application/json; charset=utf-8
Size: 108 B

Now fill the list with images, the API allows you to provide user data with each image, this can be handy to insert names or other info. So for one image this works as follows

i=1
userdata = imgalts[i]
linkie = imglinks[i]
face.uri = paste(
  'https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses/persistedFaces?userData=',
  userdata,
  sep = ";"
)
face.uri = URLencode(face.uri)
mybody = list(url = linkie )

faceLISTadd = POST(
  url = face.uri,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceLISTadd
print(content(faceLISTadd))
Response [https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses/persistedFaces?userData=Image%20of%20Naomi%20Watts]
Date: 2015-12-17 15:58
Status: 200
Content-Type: application/json; charset=utf-8
Size: 58 B

$persistedFaceId
[1] '32fa4d1c-da68-45fd-9818-19a10beea1c2'

## status 200 is OK

Just loop over the 100 faces to complete the face list. With the list of images we can now perform a query with a new ‘query face’. Two steps are needed, first call the face detect API to obtain a¬†face ID. I am going to use the image of Angelina, but a different one than the image on IMDB.


faceDetectURL = 'https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair'
img.url = 'http://a.dilcdn.com/bl/wp-content/uploads/sites/8/2009/06/angelinaangry002.jpg'

mybody = list(url = img.url)

faceRESO = POST(
  url = faceDetectURL,
  content_type('application/json'), add_headers(.headers =  c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceRESO
fID = content(faceRESO)[[1]]$faceId

With the face ID, query the face list with the “find similar” API. There is a confidence of almost 60%.


sim.URI = 'https://api.projectoxford.ai/face/v1.0/findsimilars'

mybody = list(faceID = fID, faceListID = 'listofsexyactresses' )

faceSIM = POST(
  url = sim.URI,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceSIM
yy = content(faceSIM)
yy
[[1]]
[[1]]$persistedFaceId
[1] "6b4ff942-b216-4817-9739-3653a467a594"

[[1]]$confidence
[1] 0.5980769

The picture below shows some other matches…..

matches

Conclusion

The API’s of Microsoft’s Project Oxford provide nice functionality for computer vision, face analysis. It’s fun to use them, see my ‘TweetFace’ Shiny app to analyse images on Twitter.

Cheers,

Longhow