Analyzing “Twitter faces” in R with Microsoft Project Oxford


In my previous blog post I used the Microsoft Translator API in my BonAppetit Shiny app to recommend restaurants to tourists. I’m getting a little bit addicted to the Microsoft API’s, they can be fun to use :-). In this blog post I will briefly describe some of the Project Oxford API’s of Microsoft.

The API’s can be called from within R, and if you combine them with other API’s, for example Twitter, then interesting “Twitter face” analyses can be done.  See my “TweetFace” shiny app to analyse faces that can be found on Twitter.

Project Oxford

The API’s of Project Oxford can be categorized into:

  • Computer Vision,
  • Face,
  • Video,
  • Speech and
  • Language.

The free tier subscription provides 5000 API calls per month (with a rate limit of 20 calls per minute). I focused my experiments on the computer vision and face API’s, a lot of functionality is available to analyze images. For example, categorization of images, adult content detection, OCR, face recognition, gender analysis, age estimation and emotion detection.

Calling the API’s from R

The httr package provides very convenient functions to call the Microsoft API’s. You need to sign-up first and obtain a key. Let’s do a simple test on Angelina Jolie by using the face detect API.


Angelina Jolie, picture link


faceURL = ",gender,smile,facialHair"
img.url = ''

faceKEY = '123456789101112131415'

mybody = list(url = img.url)

faceResponse = POST(
  url = faceURL, 
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
Response [,gender,smile,facialHair]
Date: 2015-12-16 10:13
Status: 200
Content-Type: application/json; charset=utf-8
Size: 1.27 kB

If the call was successful a “Status: 200” is returned and the response object is filled with interesting information. The API returns the information as JSON which is parsed by R into nested lists.

AngelinaFace = content(faceResponse)[[1]]
[1] "faceId"  "faceRectangle" "faceLandmarks" "faceAttributes"

[1] "female"

[1] 32.6

[1] 0

[1] 0

[1] 0

Well, the API recognized the gender and that there is no facial hair :-), but her age is under estimated, Angelina is 40 not 32.6! Let’s look at emotions, the emotion API has its own key and url.

URL.emoface = ''

emotionKey = 'ABCDEF123456789101112131415'

mybody = list(url = img.url)

faceEMO = POST(
  url = URL.emoface,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = emotionKEY)),
  body = mybody,
  encode = 'json'
AngelinaEmotions = content(faceEMO)[[1]]
[1] 4.573111e-05

[1] 0.001244121

[1] 0.0001096572

[1] 1.256477e-06

[1] 0.0004313129

[1] 0.9977798

[1] 0.0003823086

[1] 5.75276e-06

A fairly neutral face. Let’s test some other Angelina faces


Find similar faces

A nice piece of functionality of the API is finding similar faces. First a list of faces needs to be created, then with a ‘query face’ you can search for similar-looking faces in the list of faces. Let’s look at the most sexy actresses.

## Scrape the image URLs of the actresses

linksactresses = ''

out = read_html(linksactresses)
images = html_nodes(out, '.zero-z-index')
imglinks = html_nodes(out, xpath = "//img[@class='zero-z-index']/@src") %>% html_text()

## additional information, the name of the actress
imgalts = html_nodes(out, xpath = "//img[@class='zero-z-index']/@alt") %>% html_text()

Create an empty list, by calling the facelist API, you should spcify a facelistID, which is placed as request parameter behind the facelist URL. So my facelistID is “listofsexyactresses” as shown in the code below.

### create an id and name for the face list
URL.face = ""

mybody = list(name = 'top 100 of sexy actresses')

faceLIST = PUT(
  url = URL.face,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
Response []
Date: 2015-12-17 15:10
Status: 200
Content-Type: application/json; charset=utf-8
Size: 108 B

Now fill the list with images, the API allows you to provide user data with each image, this can be handy to insert names or other info. So for one image this works as follows

userdata = imgalts[i]
linkie = imglinks[i]
face.uri = paste(
  sep = ";"
face.uri = URLencode(face.uri)
mybody = list(url = linkie )

faceLISTadd = POST(
  url = face.uri,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
Response []
Date: 2015-12-17 15:58
Status: 200
Content-Type: application/json; charset=utf-8
Size: 58 B

[1] '32fa4d1c-da68-45fd-9818-19a10beea1c2'

## status 200 is OK

Just loop over the 100 faces to complete the face list. With the list of images we can now perform a query with a new ‘query face’. Two steps are needed, first call the face detect API to obtain a face ID. I am going to use the image of Angelina, but a different one than the image on IMDB.

faceDetectURL = ',gender,smile,facialHair'
img.url = ''

mybody = list(url = img.url)

faceRESO = POST(
  url = faceDetectURL,
  content_type('application/json'), add_headers(.headers =  c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
fID = content(faceRESO)[[1]]$faceId

With the face ID, query the face list with the “find similar” API. There is a confidence of almost 60%.

sim.URI = ''

mybody = list(faceID = fID, faceListID = 'listofsexyactresses' )

faceSIM = POST(
  url = sim.URI,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
yy = content(faceSIM)
[1] "6b4ff942-b216-4817-9739-3653a467a594"

[1] 0.5980769

The picture below shows some other matches…..



The API’s of Microsoft’s Project Oxford provide nice functionality for computer vision, face analysis. It’s fun to use them, see my ‘TweetFace’ Shiny app to analyse images on Twitter.




Combining Hadoop, Spark, R, SparkR and Shiny…. and it works :-)

A long time ago in 1991 I had my first programming course (Modula 2) at the Vrije University in Amsterdam. I spend months behind a terminal with a green monochrome display doing the programming exercises using VI. Do you remember Shift ZZ, and :q!… 🙂 After my university period I did not use VI often… Recently, I got hooked up again!


A system very similar to this one where I did my first text editing in VI

I was excited to hear the announcement of sparkR, an R interface to spark and was eager to do some experiments with the software. Unfortunately none of the Hadoop sandboxes have spark 1.4 and sparkR pre-installed to play with. So I needed to undertake some steps myself. Luckily, all steps are beautifully described in great detail on different sites.

Spin up a vps

At argeweb I rented an Ubuntu VPS, 4 cores 8 GB. That is a very small environment for Hadoop, and of course a 1 node environment does not show the full potential of Hadoop / Spark. However, I am not trying to do performance or stress tests on very large data sets, just some functional tests. Moreover, I don’t want to spent more money :-),  though the VPS can nicely be used to install nginx and host my little website -> 

Install R, Rstudio and shiny

A very nice blog post by Dean Atalli, which I am not going to repeat here, describes how easy it is to setup R, RStudio and Shiny. I followed steps 6, 7 and 8 of his blog post and the result is a running Shiny server on my VPS environment. In addition to my local RStudio application on my laptop, I can now also use R on my iPhone through Rstudio server on my VPS. Can be quit handy in a crowded bar when I need to run some R commands….

using R on my iPhone. You need good eyes!

Install Hadoop 2.7 and Spark

To run Hadoop, you need to install java first, configure SSH, fetch the hadoop tar.gz file, install it, set environment variables in the ~/.bashrc file, modify hadoop configuration files, format the hadoop file system and start it. All steps are described in full detail here. Then in addition to that download the latest version of Spark, the pre-build for hadoop 2.6 or later worked fine for me. You can just extract the tgz file, set the SPARK_HOME variable and you are done!

In each of the above steps different configuration files needed to be edited. Luckily I can still remember my basic VI skills……

Creating a very simple Shiny App

The SparkR package is already available when Spark is installed, its location is inside the Spark directory. So, when attaching the SparkR library to your R session, specify its location using the lib.loc argument in the library function. Alternatively, add the location of the SparkR library to the Search Paths for packages in R, using the .libPaths function. See some example code below.

library(SparkR, lib.loc = "/usr/local/spark/R/lib")

## initialeze SparkR environment
sc = sparkR.init(sparkHome = '/usr/local/spark')
sqlContext = sparkRSQL.init(sc)

## convert the local R 'faithful' data frame to a Spark data frame
df = createDataFrame(sqlContext, faithful)

## apply a filter waiting > 50 and show the first few records
df1 = filter(df, df$waiting > 50)

## aggregate and collect the results in a local R data sets
df2 = summarize(groupBy(df1, df1$waiting), count = n(df1$waiting))
df3.R = collect(df2)

Now create a new Shiny app, copy the R code above into the server.R file and instead of a hard coded value 50, let’s make this an input using a slider. That’s basically it, my first shiny app calling SparkR……

Cheers, Longhow