In my previous blog post I performed a path analysis in SAS on restaurant reviewers. It turned out that after a visit to a Chinese restaurant, reviewers on Iens tend to go to an “International” restaurant. But which one should I visit? A recommendation engine can answer that question. Everyone who has visited an e-commerce website for example Amazon, has experienced the results of recommendation engine. Based on your click/purchase history new products are recommended. I have a Netflix subscription, based on my viewing behavior I get recommendations for new movies, see my recommendations below.
How does it work? Lets fist look at the data that is needed, in the world of recommendation engines people often speak about users, items and the user-item rating matrix. In my scraped restaurant review data, this corresponds to reviewers, restaurants and their scores / ratings. See the figure below.
The question now is, how can we fill in the blanks? For example, in the data above Sarah likes Fussia and Jimmie’s Kitchen but she has not rated the other Restaurants. Can we (the computer) do this for her? Yes, we can fill in the blanks with a predicted rating and recommend the restaurant with the highest rating to Sarah as the restaurant to visit next. A term you often hear in this context is collaborative filtering. A class of techniques based on the believe that a person gets the most relevant recommendations from people with ‘similar’ tastes. I am not going to write about the techniques here, a nice overview paper is: Collaborative Filtering Recommender Systems By Michael D. Ekstrand, John T. Riedl and Joseph A. Konstan. It can be found here.
Iens restaurant reviewers
The review data that I have scraped from the iens website is of course much larger than the matrix shown above. There are 8,900 items (restaurants), and there are 100,889 users (reviewers). So we would have a user item matrix with 8,900 X 100,889 (= 897,912,100) ratings. That would mean that every reviewer has rated every restaurant, obviously that is not the case. In fact, the user-item matrix is often very sparse, the iens data consists of 211,143 ratings that is only 0.02% of the matrix when it is completely filled.
In SAS I can use the recommend procedure to create recommendation engines, the procedure supports different techniques
- Average, SlopeOne,
- KNN, Association Rules
- SVD, Ensemble, Cluster
The rating data that is needed to run the procedure should be given in a different form than the user-item matrix. A SAS data set with three columns, user, item and rating is needed. A snippet of the data is shown below.
If I want the system to generate “personal” restaurant recommendations for me, I should also provide some personal ratings. Well, I liked Golden chopsticks (an 8 out of 10), a few months ago I was at Fussia, that was OK (a 7 out of 10), and for SAS I was at a client in Eindhoven, so I also ate at “Van der Valk Eindhoven” I did not really liked that (a 4 out of 10). So I have a created a small data set with my ratings and added that to the Iens ratings.
After that I used the recommend procedure to try different techniques and choose the one with the smallest error on a hold-out set. The workflow is given in the following screenshot.
To zoom in on the recommend procedure, it starts with the specification of the rating data set, and the specification of the user, item and rating columns. Then a method and its corresponding options need to be set. The following figure shows an example call
After the procedure has finished, a recommendation engine is available, in the above code example an engine with two methods (SVD and ARM) is available and recommendations can be generated for each user. The code below shows how to do this.
So the first restaurant ‘T Stiefkwartierke is in Breda, the south of The Netherlands. I am going to visit that when I am in the neighborhood….