Analyzing “Twitter faces” in R with Microsoft Project Oxford

Introduction

In my previous blog post I used the Microsoft Translator API in my BonAppetit Shiny app to recommend restaurants to tourists. I’m getting a little bit addicted to the Microsoft API’s, they can be fun to use :-). In this blog post I will briefly describe some of the Project Oxford API’s of Microsoft.

The API’s can be called from within R, and if you combine them with other API’s, for example Twitter, then interesting “Twitter face” analyses can be done.  See my “TweetFace” shiny app to analyse faces that can be found on Twitter.

Project Oxford

The API’s of Project Oxford can be categorized into:

  • Computer Vision,
  • Face,
  • Video,
  • Speech and
  • Language.

The free tier subscription provides 5000 API calls per month (with a rate limit of 20 calls per minute). I focused my experiments on the computer vision and face API’s, a lot of functionality is available to analyze images. For example, categorization of images, adult content detection, OCR, face recognition, gender analysis, age estimation and emotion detection.

Calling the API’s from R

The httr package provides very convenient functions to call the Microsoft API’s. You need to sign-up first and obtain a key. Let’s do a simple test on Angelina Jolie by using the face detect API.

angelina

Angelina Jolie, picture link

library(httr)

faceURL = "https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair"
img.url = 'http://www.buro247.com/images/Angelina-Jolie-2.jpg'

faceKEY = '123456789101112131415'

mybody = list(url = img.url)

faceResponse = POST(
  url = faceURL, 
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceResponse
Response [https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair]
Date: 2015-12-16 10:13
Status: 200
Content-Type: application/json; charset=utf-8
Size: 1.27 kB

If the call was successful a “Status: 200” is returned and the response object is filled with interesting information. The API returns the information as JSON which is parsed by R into nested lists.


AngelinaFace = content(faceResponse)[[1]]
names(AngelinaFace)
[1] "faceId"  "faceRectangle" "faceLandmarks" "faceAttributes"

AngelinaFace$faceAttributes
$gender
[1] "female"

$age
[1] 32.6

$facialHair
$facialHair$moustache
[1] 0

$facialHair$beard
[1] 0

$facialHair$sideburns
[1] 0

Well, the API recognized the gender and that there is no facial hair :-), but her age is under estimated, Angelina is 40 not 32.6! Let’s look at emotions, the emotion API has its own key and url.


URL.emoface = 'https://api.projectoxford.ai/emotion/v1.0/recognize'

emotionKey = 'ABCDEF123456789101112131415'

mybody = list(url = img.url)

faceEMO = POST(
  url = URL.emoface,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = emotionKEY)),
  body = mybody,
  encode = 'json'
)
faceEMO
AngelinaEmotions = content(faceEMO)[[1]]
AngelinaEmotions$scores
$anger
[1] 4.573111e-05

$contempt
[1] 0.001244121

$disgust
[1] 0.0001096572

$fear
[1] 1.256477e-06

$happiness
[1] 0.0004313129

$neutral
[1] 0.9977798

$sadness
[1] 0.0003823086

$surprise
[1] 5.75276e-06

A fairly neutral face. Let’s test some other Angelina faces

angelina2

Find similar faces

A nice piece of functionality of the API is finding similar faces. First a list of faces needs to be created, then with a ‘query face’ you can search for similar-looking faces in the list of faces. Let’s look at the most sexy actresses.


## Scrape the image URLs of the actresses
library(rvest)

linksactresses = 'http://www.imdb.com/list/ls050128191/'

out = read_html(linksactresses)
images = html_nodes(out, '.zero-z-index')
imglinks = html_nodes(out, xpath = "//img[@class='zero-z-index']/@src") %>% html_text()

## additional information, the name of the actress
imgalts = html_nodes(out, xpath = "//img[@class='zero-z-index']/@alt") %>% html_text()

Create an empty list, by calling the facelist API, you should spcify a facelistID, which is placed as request parameter behind the facelist URL. So my facelistID is “listofsexyactresses” as shown in the code below.

### create an id and name for the face list
URL.face = "https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses"

mybody = list(name = 'top 100 of sexy actresses')

faceLIST = PUT(
  url = URL.face,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceLIST
Response [https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses]
Date: 2015-12-17 15:10
Status: 200
Content-Type: application/json; charset=utf-8
Size: 108 B

Now fill the list with images, the API allows you to provide user data with each image, this can be handy to insert names or other info. So for one image this works as follows

i=1
userdata = imgalts[i]
linkie = imglinks[i]
face.uri = paste(
  'https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses/persistedFaces?userData=',
  userdata,
  sep = ";"
)
face.uri = URLencode(face.uri)
mybody = list(url = linkie )

faceLISTadd = POST(
  url = face.uri,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceLISTadd
print(content(faceLISTadd))
Response [https://api.projectoxford.ai/face/v1.0/facelists/listofsexyactresses/persistedFaces?userData=Image%20of%20Naomi%20Watts]
Date: 2015-12-17 15:58
Status: 200
Content-Type: application/json; charset=utf-8
Size: 58 B

$persistedFaceId
[1] '32fa4d1c-da68-45fd-9818-19a10beea1c2'

## status 200 is OK

Just loop over the 100 faces to complete the face list. With the list of images we can now perform a query with a new ‘query face’. Two steps are needed, first call the face detect API to obtain a face ID. I am going to use the image of Angelina, but a different one than the image on IMDB.


faceDetectURL = 'https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair'
img.url = 'http://a.dilcdn.com/bl/wp-content/uploads/sites/8/2009/06/angelinaangry002.jpg'

mybody = list(url = img.url)

faceRESO = POST(
  url = faceDetectURL,
  content_type('application/json'), add_headers(.headers =  c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceRESO
fID = content(faceRESO)[[1]]$faceId

With the face ID, query the face list with the “find similar” API. There is a confidence of almost 60%.


sim.URI = 'https://api.projectoxford.ai/face/v1.0/findsimilars'

mybody = list(faceID = fID, faceListID = 'listofsexyactresses' )

faceSIM = POST(
  url = sim.URI,
  content_type('application/json'), add_headers(.headers = c('Ocp-Apim-Subscription-Key' = faceKEY)),
  body = mybody,
  encode = 'json'
)
faceSIM
yy = content(faceSIM)
yy
[[1]]
[[1]]$persistedFaceId
[1] "6b4ff942-b216-4817-9739-3653a467a594"

[[1]]$confidence
[1] 0.5980769

The picture below shows some other matches…..

matches

Conclusion

The API’s of Microsoft’s Project Oxford provide nice functionality for computer vision, face analysis. It’s fun to use them, see my ‘TweetFace’ Shiny app to analyse images on Twitter.

Cheers,

Longhow

 

Bon Appetit: A restaurant recommender for tourists visiting the Netherlands

Introduction

More and more tourists are visiting The Netherlands, this will become very clear if you walk through the center of Amsterdam on a sunny day. All those tourists need to eat somewhere, in some restaurant. You can see their sad faces as they have no clue where to go. Well, with the aid of a little data science I have made it easy for them :-). A small R Shiny app for tourists to inform them to which restaurant they should go in The Netherlands. In this blog post I will describe the different steps that I have taken.

iamsterdam

Tourists in Amsterdam wondering where to eat……

Iens reviews

In an earlier blog post I wrote about scraping restaurant review data from www.iens.nl and how to use that to generate restaurant recommendations. The technique was based on the restaurant ratings given by the reviewers. To generate personal recommendations you need to rate some restaurants first. But as a tourist visiting The Netherlands for the first time this might be difficult.

So I have made it a little bit easier, enter your idea of food in my Bon Appetit Shiny app, it will translate the text to Dutch if needed, then calculate the similarity of your translated text and all reviews from Iens, and then give you the top ten restaurants whose reviews matches best.

The Microsoft translator API

Almost all of the reviews on the Iens restaurant website are in Dutch, I assume that most tourists from outside The Netherlands do not speak Dutch. That is not a large problem, I can translate non Dutch text to Dutch by using a translator. Google and Microsoft offer translation API’s. I have chosen for the Microsoft API because they offer a free tier. The first 2 million characters are free per month. Sign-up and get started here. And because the API supports the Klingon language….. 🙂

The R franc package can recognize the language of the input text:


lang = franc(InputText)
ISO2 = speakers$iso6391[speakers$language==lang]
from = ISO2

The ISO 2 letter language code is needed in the call to the Microsoft translator API. I am making use of the httr package to set up the call. With your clientID and client secret a token must be retrieved. Then with this token the actual translation is done.


#Set up call to retrieve token

clientIDEncoded = URLencode("your microsoft client ID")

client_SecretEncoded = URLencode("your client secret")
Uri = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13"

MyBody = paste(
   "grant_type = client_credentials&client_id=",
   clientIDEncoded,
   "client_secret=",
   client_SecretEncoded,
   "&scope=http://api.microsofttranslator.com",
   sep="";
)

r = POST(url=Uri, body = MyBody, content_type("application/x-www-form-urlencoded"))
response = content(r)

Now that you have the token, make a call to translate the text


HeaderValue = paste("Bearer ", response$access_token, sep="")

TextEncoded = URLencode(InputText)

to = "nl"

uri2 = paste(
   "http://api.microsofttranslator.com/v2/Http.svc/Translate?text=",
   TextEncoded,
   "&from=",
   from,
   "&to=",
   to,
   sep=""
)

resp2 = GET(url = uri2, add_headers(Authorization = HeaderValue))
Translated = content(resp2)

#### dig out the text from the xml object
TranslatedText  = as(Translated , "character") %>% read_html(pp) %>% html_text()

Some example translations,

Louis van Gaal is notorious for his Dutch to English (or any other language for that matter) translations. Let’s see how the Microsoft API performs on some of his sentences.

  • Dutch: “Dat is hele andere koek”, van Gaal: That is different cook”, Microsoft: That is a whole different kettle of fish”.
  • Dutch: “de dood of de gladiolen”, van Gaal: “the dead or the gladiolus”, Microsoft: “the dead or the gladiolus”. 
  • Dutch: “Het is een kwestie van tijd”, van Gaal: “It’s a question of time”, Microsoft: “It’s a matter of time”.

The Cosine similarity

The distance or similarity between two documents (texts) can be measured by means of the cosine similarity. When you have a collection of reviews (texts), then this collection can be represented by a term document matrix. A row of this matrix is one review, its a vector of word counts. Another review or text is also a vector of word counts, given two vectors A and B the cosine similarity  is given by:

cosine

Now the input text that is translated to Dutch is also a vector of word counts and so can calculate the cosine similarity between each restaurant review and the input text. The restaurants corresponding to the most similar reviews are returned as recommended restaurants, bon appetit 🙂

Putting all together in a Shiny app

The above steps are implemented in my bon appetit Shiny app. Try out your thoughts and idea of food and get restaurant recommendations! Here is an example:

Input text: Large pizza with chicken and cheese that is tasty.

shiny

Input text translated to Dutch

shiny2

The top ten restaurants corresponding to the translated input text

 

And for the German tourist: “Ich suche eines schnelles leckeres Hahnchen”, this gets translated to Dutch “ik ben op zoek naar een snelle heerlijke kip” and the ten restaurant recommendations you get are given in the following figure.

shiny3

cheers,

— Longhow —