Analyzing User Tweets around foursquare checkins

From Cohen Courses
Revision as of 22:09, 8 October 2012 by Rgkulkar (talk | contribs) (→‎Tasks)
Jump to navigationJump to search

Project idea

Recently there has been a massive increase in the usage of location sharing social networks. Social networks such as FourSquare have brought a new way of social interaction where in an user checks in to a physical location (Food, College & University, Nightlife Spots etc). FourSquare allows the user checkins to be published as tweets. We plan to analyze the tweeting behaviour of the user after their foursquare checkin.

Team

Data

We have data for tweets over a week for around 300,000 users over the world. We expect that there will be significant number of foursquare checkins in the tweets. As a starting point we will start our analysis on this data and once we have a proof of concept we will start gathering more data. The present data has been generously shared to us by Hazim Almuhimedi, a Phd student of Institute of Software Research at CMU.

Tasks

  • For a user, we plan to analyze tweets (within a small interval) after their foursquare check-in, to see if the user talks about things related to the places in which he/she has checked in.
  • Analyzing all the tweets that follow foursquare check-in to a particular place (or category), to see what percentage of the users do tweet about that place.
  • Find out the topics that users mostly talk about when they are at a particular place.
  • Once we have all the tweets about a particular place, analyze the overall sentiment about that place. (For example, a particular restaurant is liked by most people or not).

Note : We will be able to do that if we find out that a significant number of people actually tweet about a place they are in after checking into that place.

Evaluation

  • Quantitative: Build a small annotated test dataset to evaluate the accuracy of our prediction.
  • Qualitative : For sentiment analysis on restaurant tweets, we will see if the overall sentiment correlates with the ratings on other famous social networks like Yelp.

Key Technical Challenges

  • We might not have sufficient amount of data if we narrow to a single location (for example a particular restaurant)
  • Given the limited amount of data, we are not sure if we can do topic modelling accurately (since tweets are inherently short)