Difference between revisions of "Restaurant Recommendations Based On Review Content"

From Cohen Courses
Jump to navigationJump to search
m
m
Line 21: Line 21:
 
== Dataset ==
 
== Dataset ==
  
[http://www.yelp.com/academic_dataset | Yelp academic dataset]
+
[http://www.yelp.com/academic_dataset| Yelp academic dataset]
  
 
== Baseline ==
 
== Baseline ==
  
 
== Evaluation ==
 
== Evaluation ==

Revision as of 22:39, 28 September 2011

Basic idea

Current recommendation systems rely on collaborative filtering. Suppose we want to recommend a product to John. One way is to look for users who share similar rating patterns as John, and use the ratings from these like minded users to recommend a few products to John. Another way would be to build a item-item matrix that determines the similarity between pairs of items. From this matrix, as well as the John's data (ratings, etc), we can try to infer his tastes and recommend similar items.

For our 11-763 project, we propose a way recommendation system that looks at the text in user reviews.

Assumptions

  1. The text in a user's review will reflect his personal tastes and preferences in restaurants. For instance, he might mention his favorite food, be particular about the service or tend to talk about the ambiance of the restaurant.
  2. By looking at all the reviews for a specific restaurant, we can infer the strengths/weaknesses of the restaurant. For instance, if many reviews talk about the excellent service, we can use this knowledge in our recommendation system.
  3. The user has the same "tastes" regardless of which restaurants he go to.

Brief summary of method

For our problem, we assume that we have a set of users , set of items (things) and set reviews where each for some user, thing and sequence words.

Topic models have been widely used in text modelling to learn about topics that are being mentioned in text. For our problem, we shall learn topic distributions over a restaurant's review as well as a user's reviews. Both reviews about restaurants, or review made by a user will share the same topics. Hence, a user's "taste" would be represented by a distribution over topics. Similarly, a restaurant's characteristic would also be represented by a distribution over topics. We can consider a model similar to the Author-Topic model where we choose a topic distribution based on user/restaurant for each review. Alternatively, we can consider a nonparametric Hierarchical Dirichlet Process over two "groups" of reviews (one for restaurants, another from users).

By measuring the similarity (Jensen-Shannon divergence?) between a user's taste and a restaurant's characteristics, we hope to be able to recommend a few candidate restaurants for the user.

Dataset

Yelp academic dataset

Baseline

Evaluation