Restaurant analytics…. Now it becomes personal!

Its personal

In my previous blog post I performed a path analysis in SAS on restaurant reviewers. It turned out that after a visit to a Chinese restaurant, reviewers on Iens tend to go to an “International” restaurant. But which one should I visit? A recommendation engine can answer that question. Everyone who has visited an e-commerce website for example Amazon, has experienced the results of recommendation engine. Based on your click/purchase history new products are recommended. I have a Netflix subscription, based on my viewing behavior I get recommendations for new movies, see my recommendations below.

recommender1

Click to enlarge. Obviously these recommendations are based on the viewing behavior of my son and daugther, who spend too much time behind Netflix…. 🙂

Collaborative Filtering

How does it work? Lets fist look at the data that is needed, in the world of recommendation engines people often speak about users, items and the user-item rating matrix. In my scraped restaurant review data, this corresponds to reviewers, restaurants and their scores / ratings. See the figure below.

recommender2

The question now is, how can we fill in the blanks? For example, in the data above Sarah likes Fussia and Jimmie’s Kitchen but she has not rated the other Restaurants. Can we (the computer) do this for her? Yes, we can fill in the blanks with a predicted rating and recommend the restaurant with the highest rating to Sarah as the restaurant to visit next. A term you often hear in this context is collaborative filtering. A class of techniques based on the believe that a person gets the most relevant recommendations from people with ‘similar’ tastes. I am not going to write about the techniques here, a nice overview paper is: Collaborative Filtering Recommender Systems By Michael D. Ekstrand, John T. Riedl and Joseph A. Konstan. It can be found here.

Iens restaurant reviewers

The review data that I have scraped from the iens website is of course much larger than the matrix shown above. There are 8,900 items (restaurants), and there are 100,889 users (reviewers). So we would have a user item matrix with 8,900 X 100,889 (= 897,912,100) ratings. That would mean that every reviewer has rated every restaurant, obviously that is not the case. In fact, the user-item matrix is often very sparse, the iens data consists of 211,143 ratings that is only 0.02% of the matrix when it is completely filled.

In SAS I can use the recommend procedure to create recommendation engines, the procedure supports different techniques

  • Average, SlopeOne,
  • KNN, Association Rules
  • SVD, Ensemble, Cluster

The rating data that is needed to run the procedure should be given in a different form than the user-item matrix. A SAS data set with three columns, user, item and rating is needed. A snippet of the data is shown below.

recommender3

If I want the system to generate “personal” restaurant recommendations for me, I should also provide some personal ratings. Well, I liked Golden chopsticks (an 8 out of 10), a few months ago I was at Fussia, that was OK (a 7 out of 10), and for SAS I was at a client in Eindhoven, so I also ate at “Van der Valk Eindhoven” I did not really liked that (a 4 out of 10). So I have a created a small data set with my ratings and added that to the Iens ratings.

recommender4

After that I used the recommend procedure to try different techniques and choose the one with the smallest error on a hold-out set. The workflow is given in the following screenshot.

recommender6

To zoom in on the recommend procedure, it starts with the specification of the rating data set, and the specification of the user, item and rating columns. Then a method and its corresponding options need to be set. The following figure shows an example call

recommender5My personal recommendations 

After the procedure has finished, a recommendation engine is available, in the above code example an engine with two methods (SVD and ARM) is available and recommendations can be generated for each user. The code below shows how to do this.

recommender7And the top five restaurants I should visit are (with their predicted rating)……

  1. ‘T Stiefkwartierke (9.61)
  2. Brazz (9.19)
  3. Bandoeng (9.05)
  4. De Burgermeester (9.00)
  5. Argentinos (9.00)

So the first restaurant ‘T Stiefkwartierke is in Breda, the south of The Netherlands. I am going to visit that when I am in the neighborhood….

Restaurant analytics: Text mining, Path analysis, Sankey, Sunbursts and Chord plots

Last month I visited my favorite Chinese restaurant with some friends in the center of Amsterdam, Golden chopsticks, and had some nice food. I was wondering if other people shared the same opinion. So I visited iens.nl, a Dutch restaurant review site and looked up Golden chopsticks restaurant. They have fairly good reviews, so that was good. For our next restaurant diner my friends wanted to try something else, we had Chinese, so do we want to go to a Chinese restaurant again, or something else?

Let’s use some analytics to help me decide. With R and the package rVest it is not difficult to retrieve data from the Iens restaurant site. Some knowledge on CSS, Xpath and regular expressions is needed but then you can scrape away…. Inside a for loop over i,  I have code fragments like the ones given below.


start   = "http://www.iens.nl/restaurant/?q=&perPagina=20&pagina="
httploc = paste(start,i,sep="")

restaurants     = html(httploc)
recensieinfo    = html_text(html_nodes(restaurants,xpath='//div[@class="listerItem_score"]'))
restaurantLabel = html_nodes(restaurants,".listerItem")
restName        = html_text(html_nodes(reslabel2,".fontSerif"))
restAddress     = html_nodes(restarantLabel ,"div address")

Nreviews        = str_extract(str_extract(recensieinfo,pattern="[0-9]* recensies"), pattern = "[0-9]*")
averageScore    = str_extract(recensieinfo,pattern="[0-9].[0-9]")

# more statements

I have retrieved information of around 11.000 restaurants, for these restaurants there are around 215.000 reviews (taking only the reviews that also have numeric scores for food, service and scenery, and ignoring older reviews). The data I scraped are in two tables, one table at restaurant level and one on review level.

restdata  restdata2

First, some interesting facts on the scraped data.

Chinese restaurant names

It is true that certain names of Chinese restaurants occur more often. There are around 1600 Chinese restaurants, the most frequent names: Kota Radja (39), Peking (36), Lotus (33), De Chinese Muur (32), De Lange Muur (25), China Garden (25), Hong kong (22), Ni Hao (16). My father once had a Chinese restaurant called Hong Kong!  See the dashboard below. These kind of more occurring names is quit specific to Chinese restaurants, if we look at other kitchen types we find more unique restaurant names. You’ll find the names per kitchen in a little Shiny app here.

Kitchen/restaurant type and number of reviews

Of the 11.000 restaurants, there are around 8900 restaurants that have reviews, with restaurants El Olivo, Rhodos, Floor17 having the most reviews. There are around 215.000 reviews written by 103.800 reviewers, with MartinK having the most reviews.

The five types of restaurants that occur most often in all the reviews are “International”, “Hollands”, French, Italian and Chinese, as shown in the tree map below. The color of the tree map represents the average number of reviews per restaurant type. So although Chinese restaurants form a large fraction (8.8%) of all restaurants, there are only on average 8.3 reviews per Chinese restaurant. On the other hand, Italian restaurants form 8.4% of all restaurants and have more than 18 reviews per Italian restaurant. Conclusion: People eat at Chinese restaurants, but they just don’t write about it…..

kitchentypes

Review dates

The following figure shows a screen shot of a SAS Visual Analytics dashboard that I made from the Iens data, it shows a couple of things.

  • Uniqueness of restaurant names per kitchen type
  • The number of reviews per day, we can see a Valentin peak at 14-Feb-2015,
  • The avergare score per day, it looks like the scores in the month July are a little bit lower.
  • Saturdays and Fridays are more crowded for restaurants, everybody knows that, and the Iens review data confirms this.

dashboard

Review topics

If you look at the Iens site, then you’ll see that almost all of the reviews are written in Dutch, which you would expect. But if you run a language detection first on the reviews, its funny to see that here are some reviews in other languages. English is the second language (300 or so reviews) and a few Italian reviews. It turns out that the language detection (R textcat package) on “Italian” reviews is fooled by some particular Italian words like Buonissimo, cappuccino carprese and piza.

What are the topics that we can extract from the 215.000 reviews, In an earlier blog post I wrote about text mining, I have used SAS Text Miner to generate topics from the Iens reviews. So what are people writing about? The picture below shows the five topics that occur a lot in all the reviews. A topic in SAS Text Miner is characterized by key terms (in Dutch) which are given below.

topics

So it’s always the same: complaining about the long wait for the food…. but on the other hand a lot of reviews on how great and fantastic the evening was…

Next restaurant type to visit

To come back to my first question: what type of restaurant should I visit next? I can perform a path analysis to answer that question. Scraping the restaurant site not only resulted in the texts of the reviews but also the reviewer, the date and the restaurant(type). So in the data below for example Pauls path consists of a first visited to Sisters, then to  Fussia and then to Milo.

pathdata

With SAS software I can either use Visual Analytics to generate a Sankey diagram that will show the most occurring paths. Or alternatively, I can perform a path analysis on this data in Enterprise Miner and then visualize the most occurring paths in a sunburst or chord diagram. See the pictures below.

pathanalysissankey

sunburstchordplot

Interactive versions of the IENS sunburst and chord diagrams can be found here and hereConclusion: It turns out that after a Chinese restaurant visit most reviewers will visit an “International” restaurant type…. In a next blog post I will go one step further and show how to use techniques from recommendation engines to recommend at restaurant level.

Cheers,

Longhow