Difference between revisions of "Analyzing User Tweets around foursquare checkins"

From Cohen Courses
Jump to navigationJump to search
Line 2: Line 2:
  
 
Note: A previous project for this course used [[4-square data and location-based Twitter data]], which might be available. --[[User:Wcohen|Wcohen]] 20:31, 10 October 2012 (UTC)
 
Note: A previous project for this course used [[4-square data and location-based Twitter data]], which might be available. --[[User:Wcohen|Wcohen]] 20:31, 10 October 2012 (UTC)
 +
 +
 +
* The problem/task is not concrete. You may want to write more about what exactly you want to predict and how do you want to do it i.e. what's your approach and what features will you use.
 +
* Some ideas
 +
** As you have noted the data could be really small w.r.t. check-in and location based tweet information. One possibility is that you could leave-out a portion of "location-based" tweets
 +
as test-set for evaluation. Then take the rest of the location-based tweets as a seed-set to cluster the unlabeled tweets from the dataset.
 +
** You may want to start with a coarse-level prediction i.e., category-type (say restaurants_in_sq._hill or just restaurants) as opposed to fine-grained i.e. exact place (name of that restaurant) for the sentiment-analysis.
 +
 +
-- [[User:Apappu|Apappu]] 13:32, 11 October 2012 (UTC)
  
 
== Project idea ==
 
== Project idea ==

Revision as of 09:34, 11 October 2012

Comments

Note: A previous project for this course used 4-square data and location-based Twitter data, which might be available. --Wcohen 20:31, 10 October 2012 (UTC)


  • The problem/task is not concrete. You may want to write more about what exactly you want to predict and how do you want to do it i.e. what's your approach and what features will you use.
  • Some ideas
    • As you have noted the data could be really small w.r.t. check-in and location based tweet information. One possibility is that you could leave-out a portion of "location-based" tweets

as test-set for evaluation. Then take the rest of the location-based tweets as a seed-set to cluster the unlabeled tweets from the dataset.

    • You may want to start with a coarse-level prediction i.e., category-type (say restaurants_in_sq._hill or just restaurants) as opposed to fine-grained i.e. exact place (name of that restaurant) for the sentiment-analysis.

-- Apappu 13:32, 11 October 2012 (UTC)

Project idea

Recently there has been a massive increase in the usage of location sharing social networks. Social networks such as FourSquare have brought a new way of social interaction where in an user checks in to a physical location (Food, College & University, Nightlife Spots etc). FourSquare allows the user checkins to be published as tweets. We plan to analyze the tweeting behaviour of the user after their foursquare checkin.

Team

Data

We have data for tweets over a week for around 300,000 users over the world. We expect that there will be significant number of foursquare checkins in the tweets. As a starting point we will start our analysis on this data and once we have a proof of concept we will start gathering more data. The present data has been generously shared to us by Hazim Almuhimedi, a Phd student of Institute of Software Research at CMU.

Tasks

  • For a user, we plan to analyze tweets (within a small interval) after their foursquare check-in, to see if the user talks about things related to the places in which he/she has checked in.
  • Analyzing all the tweets that follow foursquare check-in to a particular place (or category), to see what percentage of the users do tweet about that place.
  • Find out the topics that users mostly talk about when they are at a particular place.
  • Once we have all the tweets about a particular place, analyze the overall sentiment about that place. (For example, a particular restaurant is liked by most people or not).

Note : We will be able to do the task of sentiment analysis only if we find out that a significant number of people actually tweet about a place they are in after checking into that place.

Evaluation

  • Quantitative: Build a small annotated test dataset to evaluate the accuracy of our prediction.
  • Qualitative : For sentiment analysis on restaurant tweets, we will see if the overall sentiment correlates with the ratings on other famous social networks like Yelp.

Key Technical Challenges

  • We might not have sufficient amount of data if we narrow to a single location (for example a particular restaurant)
  • Given the limited amount of data, we are not sure if we can do topic modelling accurately (since tweets are inherently short)