Difference between revisions of "Analyzing User Tweets around foursquare checkins"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Project idea == Recently there has been a massive increase in the usage of location sharing social networks. Social networks such as FourSquare have brought a new way of soci…')
 
Line 23: Line 23:
 
* We might not have sufficient amount of data if we narrow to a single location (for example a particular restaurant)
 
* We might not have sufficient amount of data if we narrow to a single location (for example a particular restaurant)
 
* Given the limited amount of data, we are not sure if we can do topic modelling accurately (since tweets are inherently short)
 
* Given the limited amount of data, we are not sure if we can do topic modelling accurately (since tweets are inherently short)
 +
 +
== What we hope to learn from this project ==
 +
 +
We plan to apply some interesting topic/key-word modelling techniques. So we will get an overview of how these methods perform on short text like twitter data.

Revision as of 21:44, 8 October 2012

Project idea

Recently there has been a massive increase in the usage of location sharing social networks. Social networks such as FourSquare have brought a new way of social interaction where in an user checks in to a physical location (Food, College & University, Nightlife Spots etc). FourSquare allows the user checkins to be published as tweets. We plan to analyze the tweeting behaviour of the user after their foursquare checkin.

Data

We have data for tweets over a week for around 300,000 users over the world. We expect that there will be significant number of foursquare checkins in the tweets. As a starting point we will start our analysis on this data and once we have a proof of concept we will start gathering more data. The present data has been generously shared to us by Hazim Almuhimedi, a Phd student of Institute of Software Research at CMU.

Tasks

  • For an user, we plan to analyze tweets (within a small interval) after their foursquare check-in, to see if the user talks about things related to the places in which he/she has checked in.
  • Analyzing all the tweets that follow foursquare check-in to a particular place (or category), to see what percentage of the users do tweet about that place.
  • Find out the topics that users mostly talk about when they are at a particular place.
  • Once we have all the tweets about a particular place, analyze the overall sentiment about that place. (For example, a particular restaurant is liked by most people or not).

Note : We will be able to do that if we find out that a significant number of people actually tweet about a place they are in after checking into that place.

Evaluation

  • Quantitative: Build a small annotated test dataset to evaluate the accuracy of our prediction.
  • Qualitative : For sentiment analysis on restaurant tweets, we will see if the overall sentiment correlates with the ratings on other famous social networks like Yelp.

Key Technical Challenges

  • We might not have sufficient amount of data if we narrow to a single location (for example a particular restaurant)
  • Given the limited amount of data, we are not sure if we can do topic modelling accurately (since tweets are inherently short)

What we hope to learn from this project

We plan to apply some interesting topic/key-word modelling techniques. So we will get an overview of how these methods perform on short text like twitter data.