Restaurant Recommendations Based On Review Content

From Cohen Courses
Revision as of 10:36, 11 October 2011 by Wcohen (talk | contribs) (→‎Dataset)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Team

Basic idea

Current recommendation systems rely on collaborative filtering. Suppose we want to recommend a product to John. One way is to look for users who share similar rating patterns as John, and use the ratings from these like minded users to recommend a few products to John. Another way would be to build a item-item matrix that determines the similarity between pairs of items. From this matrix, as well as the John's data (ratings, etc), we can try to infer his tastes and recommend similar items.

For our 11-763 project, we propose a way recommendation system that looks at the text in user reviews.

Assumptions

  1. The text in a user's review will reflect his personal tastes and preferences in restaurants. For instance, he might mention his favorite food, be particular about the service or tend to talk about the ambiance of the restaurant. On the other hand, in the review, he might also mention what he dislikes about that particular place or talk about his displeasure.
  2. By looking at all the reviews for a specific restaurant, we can infer the strengths/weaknesses of the restaurant. For instance, if many reviews talk about the excellent service, we can use this knowledge in our recommendation system.
  3. The user has the same "tastes" regardless of which restaurants he go to.

Brief summary of method

For our problem, we assume that we have a set of users , set of items (things) and set reviews where each for some user, thing and sequence words.

Topic models have been widely used in text modelling to learn about topics that are being mentioned in text. For our problem, we shall learn topic distributions over a restaurant's review as well as a user's reviews. Both reviews about restaurants, or review made by a user will share the same topics. Hence, a user's "taste" would be represented by a distribution over topics. Similarly, a restaurant's characteristic would also be represented by a distribution over topics.

We can consider a model similar to the Author-Topic model where we choose a topic distribution based on user/restaurant for each review. Alternatively, we can consider a nonparametric Hierarchical Dirichlet process over two "groups" of reviews (one for restaurants, another from users).

Hdp lda for review.png

By measuring the similarity (maybe Jensen-Shannon divergence?) between a user's taste and a restaurant's characteristics, we hope to be able to recommend a few candidate restaurants for the user.

Dataset

Yelp academic dataset

We will probably focus on restaurants in a city, maybe Pittsburgh or New York City.

Question from William: how large is the Yelp dataset, and who else has used it? --Wcohen 14:36, 11 October 2011 (UTC)

Baseline

For our baseline, we are considering traditional collaborative filtering methods. We will find a set of users that are most similar to the current user (via their ratings of restaurants), and aggregate a set of positively rated restaurants, which we will use as a candidate set to recommend to the user.

Question from William: what traditional method, specifically, will you use as the baseline? are you going to use an existing implementation or roll your own? --Wcohen 14:35, 11 October 2011 (UTC)

Evaluation

When a user reviews a restaurant, we can assume that he has personally visited the place. Hence, we intend to identify a sample group of users, and take out their reviews as a test set. If a system recommends a restaurant that is in the test set and has been positively reviewed by the user, we would consider it to be a good recommendation (afterall the user went to the place and gave positive ratings for it!)