Difference between revisions of "Cohen Courses:Tweet"

From Cohen Courses
Jump to navigationJump to search
Line 12: Line 12:
 
* tweet differently at different locations - e.g. a tweet made from a restaurant (about the food, the service, etc) maybe different from a tweet made from an office (about works, etc)
 
* tweet differently at different locations - e.g. a tweet made from a restaurant (about the food, the service, etc) maybe different from a tweet made from an office (about works, etc)
 
* location is affected by time - e.g. a person is more likely to tweet from the office in the morning than from a nightspot
 
* location is affected by time - e.g. a person is more likely to tweet from the office in the morning than from a nightspot
* sentiment is affected by location and time - e.g. a person maybe more likely to feel sombre in the office in weekdays than in travel spots in holidays or weekends
+
* sentiment is affected by location and/or time - e.g. a person maybe more likely to feel sombre in the office in weekdays than in travel spots in holidays or weekends
  
 
How locations of tweets change with time represents geographical activity profile of the user. Such activity maybe structured across geographical space and across time. This structure is what we want to learn about the user based on his tweets. Using the structure and the tweet, we would like to infer the location from which the tweet is made.
 
How locations of tweets change with time represents geographical activity profile of the user. Such activity maybe structured across geographical space and across time. This structure is what we want to learn about the user based on his tweets. Using the structure and the tweet, we would like to infer the location from which the tweet is made.
  
The location categories to infer are taken from Foursquare categories: "Arts and Entertainment", "College and Education", "Food", "Home/Work/Other", "Nightlife Spots", "Great Outdoors", "Shops", "Travel Spots".
+
The location categories to infer are taken from [https://foursquare.com/ Foursquare] categories: "Arts and Entertainment", "College and Education", "Food", "Home/Work/Other", "Nightlife Spots", "Great Outdoors", "Shops", "Travel Spots".
 +
 
 +
== Proposed Approach ==
 +
 
 +
There are a couple of challenges to this task, among others, that we can think of:
 +
 
 +
* tweets are inherently noisy with shorthands and non-standard vocabulary
 +
* there may not be any location cues in the tweet: e.g. a user maybe in a restaurant but his tweet may not reflect him being in a restaurant
 +
* a user may tweet about a location but he/she may not even be in that location (i.e. it can be just a location that he is interested in)
 +
* a user may not have a structure in his tweeting habit: i.e. he may not have any geographical pattern to his activity or even if he has, he may not tweet regularly about it or geo-tagged the tweet regularly
 +
 
 +
To begin the project, we would like to
  
 
== Baseline & Dataset ==
 
== Baseline & Dataset ==
  
 +
For the baseline, we will be using bag-of-words model to predict location category. We would like to find out whether adding the structure across geographical space and time will improve the prediction results.
 +
 +
For the dataset, we have obtained
  
  

Revision as of 23:18, 5 October 2011

Inferring geographical activity using Twitter.

Team Member(s)

Proposal

In this project we would like to infer the location category of a tweet based on the words in the tweet (including sentiments) and the time of the tweet. We believe Twitter users:

  • tweet differently at different locations - e.g. a tweet made from a restaurant (about the food, the service, etc) maybe different from a tweet made from an office (about works, etc)
  • location is affected by time - e.g. a person is more likely to tweet from the office in the morning than from a nightspot
  • sentiment is affected by location and/or time - e.g. a person maybe more likely to feel sombre in the office in weekdays than in travel spots in holidays or weekends

How locations of tweets change with time represents geographical activity profile of the user. Such activity maybe structured across geographical space and across time. This structure is what we want to learn about the user based on his tweets. Using the structure and the tweet, we would like to infer the location from which the tweet is made.

The location categories to infer are taken from Foursquare categories: "Arts and Entertainment", "College and Education", "Food", "Home/Work/Other", "Nightlife Spots", "Great Outdoors", "Shops", "Travel Spots".

Proposed Approach

There are a couple of challenges to this task, among others, that we can think of:

  • tweets are inherently noisy with shorthands and non-standard vocabulary
  • there may not be any location cues in the tweet: e.g. a user maybe in a restaurant but his tweet may not reflect him being in a restaurant
  • a user may tweet about a location but he/she may not even be in that location (i.e. it can be just a location that he is interested in)
  • a user may not have a structure in his tweeting habit: i.e. he may not have any geographical pattern to his activity or even if he has, he may not tweet regularly about it or geo-tagged the tweet regularly

To begin the project, we would like to

Baseline & Dataset

For the baseline, we will be using bag-of-words model to predict location category. We would like to find out whether adding the structure across geographical space and time will improve the prediction results.

For the dataset, we have obtained


Related Work