recommendation system

Introduction

More and more tourists are visiting The Netherlands, this will become very clear if you walk through the center of Amsterdam on a sunny day. All those tourists need to eat somewhere, in some restaurant. You can see their sad faces as they have no clue where to go. Well, with the aid of a little data science I have made it easy for them :-). A small R Shiny app for tourists to inform them to which restaurant they should go in The Netherlands. In this blog post I will describe the different steps that I have taken.

Tourists in Amsterdam wondering where to eat……

Iens reviews

In an earlier blog post I wrote about scraping restaurant review data from www.iens.nl and how to use that to generate restaurant recommendations. The technique was based on the restaurant ratings given by the reviewers. To generate personal recommendations you need to rate some restaurants first. But as a tourist visiting The Netherlands for the first time this might be difficult.

So I have made it a little bit easier, enter your idea of food in my “Bon Appetit“ Shiny app, it will translate the text to Dutch if needed, then calculate the similarity of your translated text and all reviews from Iens, and then give you the top ten restaurants whose reviews matches best.

The Microsoft translator API

Almost all of the reviews on the Iens restaurant website are in Dutch, I assume that most tourists from outside The Netherlands do not speak Dutch. That is not a large problem, I can translate non Dutch text to Dutch by using a translator. Google and Microsoft offer translation API’s. I have chosen for the Microsoft API because they offer a free tier. The first 2 million characters are free per month. Sign-up and get started here. And because the API supports the Klingon language….. 🙂

The R franc package can recognize the language of the input text:


lang = franc(InputText)
ISO2 = speakers$iso6391[speakers$language==lang]
from = ISO2

The ISO 2 letter language code is needed in the call to the Microsoft translator API. I am making use of the httr package to set up the call. With your clientID and client secret a token must be retrieved. Then with this token the actual translation is done.


#Set up call to retrieve token

clientIDEncoded = URLencode("your microsoft client ID")

client_SecretEncoded = URLencode("your client secret")
Uri = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13"

MyBody = paste(
   "grant_type = client_credentials&amp;amp;client_id=",
   clientIDEncoded,
   "client_secret=",
   client_SecretEncoded,
   "&scope=http://api.microsofttranslator.com",
   sep="";
)

r = POST(url=Uri, body = MyBody, content_type("application/x-www-form-urlencoded"))
response = content(r)

Now that you have the token, make a call to translate the text


HeaderValue = paste(&amp;quot;Bearer &amp;quot;, response$access_token, sep=&amp;quot;&amp;quot;)

TextEncoded = URLencode(InputText)

to = "nl"

uri2 = paste(
   "http://api.microsofttranslator.com/v2/Http.svc/Translate?text=",
   TextEncoded,
   "&from=",
   from,
   "&to=",
   to,
   sep=""
)

resp2 = GET(url = uri2, add_headers(Authorization = HeaderValue))
Translated = content(resp2)

#### dig out the text from the xml object
TranslatedText  = as(Translated , "character") %>% read_html(pp) %>% html_text()

Some example translations,

Louis van Gaal is notorious for his Dutch to English (or any other language for that matter) translations. Let’s see how the Microsoft API performs on some of his sentences.

Dutch: “Dat is hele andere koek”, van Gaal: “That is different cook”, Microsoft: “That is a whole different kettle of fish”.
Dutch: “de dood of de gladiolen”, van Gaal: “the dead or the gladiolus”, Microsoft: “the dead or the gladiolus”.
Dutch: “Het is een kwestie van tijd”, van Gaal: “It’s a question of time”, Microsoft: “It’s a matter of time”.

The Cosine similarity

The distance or similarity between two documents (texts) can be measured by means of the cosine similarity. When you have a collection of reviews (texts), then this collection can be represented by a term document matrix. A row of this matrix is one review, its a vector of word counts. Another review or text is also a vector of word counts, given two vectors A and B the cosine similarity is given by:

cosine

Now the input text that is translated to Dutch is also a vector of word counts and so can calculate the cosine similarity between each restaurant review and the input text. The restaurants corresponding to the most similar reviews are returned as recommended restaurants, bon appetit 🙂

Putting all together in a Shiny app

The above steps are implemented in my bon appetit Shiny app. Try out your thoughts and idea of food and get restaurant recommendations! Here is an example:

Input text: Large pizza with chicken and cheese that is tasty.

Input text translated to Dutch

The top ten restaurants corresponding to the translated input text

And for the German tourist: “Ich suche eines schnelles leckeres Hahnchen”, this gets translated to Dutch “ik ben op zoek naar een snelle heerlijke kip” and the ten restaurant recommendations you get are given in the following figure.

shiny3

cheers,

— Longhow —

Its personal

In my previous blog post I performed a path analysis in SAS on restaurant reviewers. It turned out that after a visit to a Chinese restaurant, reviewers on Iens tend to go to an “International” restaurant. But which one should I visit? A recommendation engine can answer that question. Everyone who has visited an e-commerce website for example Amazon, has experienced the results of recommendation engine. Based on your click/purchase history new products are recommended. I have a Netflix subscription, based on my viewing behavior I get recommendations for new movies, see my recommendations below.

Click to enlarge. Obviously these recommendations are based on the viewing behavior of my son and daugther, who spend too much time behind Netflix…. 🙂

Collaborative Filtering

How does it work? Lets fist look at the data that is needed, in the world of recommendation engines people often speak about users, items and the user-item rating matrix. In my scraped restaurant review data, this corresponds to reviewers, restaurants and their scores / ratings. See the figure below.

The question now is, how can we fill in the blanks? For example, in the data above Sarah likes Fussia and Jimmie’s Kitchen but she has not rated the other Restaurants. Can we (the computer) do this for her? Yes, we can fill in the blanks with a predicted rating and recommend the restaurant with the highest rating to Sarah as the restaurant to visit next. A term you often hear in this context is collaborative filtering. A class of techniques based on the believe that a person gets the most relevant recommendations from people with ‘similar’ tastes. I am not going to write about the techniques here, a nice overview paper is: Collaborative Filtering Recommender Systems By Michael D. Ekstrand, John T. Riedl and Joseph A. Konstan. It can be found here.

Iens restaurant reviewers

The review data that I have scraped from the iens website is of course much larger than the matrix shown above. There are 8,900 items (restaurants), and there are 100,889 users (reviewers). So we would have a user item matrix with 8,900 X 100,889 (= 897,912,100) ratings. That would mean that every reviewer has rated every restaurant, obviously that is not the case. In fact, the user-item matrix is often very sparse, the iens data consists of 211,143 ratings that is only 0.02% of the matrix when it is completely filled.

In SAS I can use the recommend procedure to create recommendation engines, the procedure supports different techniques

Average, SlopeOne,
KNN, Association Rules
SVD, Ensemble, Cluster

The rating data that is needed to run the procedure should be given in a different form than the user-item matrix. A SAS data set with three columns, user, item and rating is needed. A snippet of the data is shown below.

If I want the system to generate “personal” restaurant recommendations for me, I should also provide some personal ratings. Well, I liked Golden chopsticks (an 8 out of 10), a few months ago I was at Fussia, that was OK (a 7 out of 10), and for SAS I was at a client in Eindhoven, so I also ate at “Van der Valk Eindhoven” I did not really liked that (a 4 out of 10). So I have a created a small data set with my ratings and added that to the Iens ratings.

After that I used the recommend procedure to try different techniques and choose the one with the smallest error on a hold-out set. The workflow is given in the following screenshot.

To zoom in on the recommend procedure, it starts with the specification of the rating data set, and the specification of the user, item and rating columns. Then a method and its corresponding options need to be set. The following figure shows an example call

My personal recommendations

After the procedure has finished, a recommendation engine is available, in the above code example an engine with two methods (SVD and ARM) is available and recommendations can be generated for each user. The code below shows how to do this.

And the top five restaurants I should visit are (with their predicted rating)……

‘T Stiefkwartierke (9.61)
Brazz (9.19)
Bandoeng (9.05)
De Burgermeester (9.00)
Argentinos (9.00)

So the first restaurant ‘T Stiefkwartierke is in Breda, the south of The Netherlands. I am going to visit that when I am in the neighborhood….

Longhow Lam's Blog

Data Scientist, Machine learning, R, SAS, Python – Amsterdam (NL)

Bon Appetit: A restaurant recommender for tourists visiting the Netherlands