Difference between revisions of "Analysis of Twitter user's location behaviors"

From Cohen Courses
Jump to navigationJump to search
Line 62: Line 62:
  
 
<bliu1>
 
<bliu1>
The key challenge is the infer users' tweeting habit by listening to a noisy channel.
+
 
 +
* One of the challenges here is to infer users' behavior from tweet stream, which is a noisy channel.
 +
Users might not have consistent tweeting style, or behave differently in different sources (for example, web browser V.S. mobile clients)
 +
* Another challenge is the analysis of social interaction of users. The interaction flow that we can observe might be incomplete.
 +
What kinds of social impact can we identify? We might need to dig deeper into our data to find the answer.
 +
 
 
</bliu1>
 
</bliu1>
  

Revision as of 20:28, 8 October 2012

What's the team

What’s the data you’ll work with?

We will work with a set of 500 Twitter users and all their friends. Two users are defined as friends if they follow each other. The set of core users are chosen to be those who have sent out at least N tweets with location information during the past P days.


<bliu1>

We first select a list of users in twitter who utilize location information in their tweets.

  • Seed user set (SUS): we randomly select a set of users (say 500) that attach location information in their tweets. (By listening to the streaming API).

In order to make their location interpretable by us, we enforce them to be in Great Pittsburgh area.

  • Extended user set (CUS) : we extract all the mutual followers of users in SUS in order to observe the social sphere of the seed users.

(if user A is in SUS, A follows B and B follows A, then B is in the EUS. B does not have to be using location feature)

Then we crawl user profiles and tweeting history for all the users in EUS in a specified time period (say 60 days) using REST API, in order to study the nature of their tweeting behavior using location.

</bliu1>

What’s the task or tasks?

Through our analysis, we would like to answer the following research questions:

  • If one user sends out his/her location information actively, what can we infer about his/her friends' location behavior?
  • What is most "active location user"'s motivation for posting their location information?
  • What can we infer from a user's most frequent location?
  • ...


<bliu1>

Research questions:

  • RQ1: What is the fraction of users that use location feature in Twitter? And how frequent are they using it?
  • RQ2: What are relations between location and tweet content among "location-active" users?
  • RQ3: Does "location-active" users share common posting behavior with their social sphere?
  • RQ4: How is the impact of location info in the "location-active" users' social sphere?
  • RQ5: What can we infer from users' location info? (Can we locate their living area, their home or their workplace? )

</bliu1>

How will you evaluate? (qualitative or quantitatively?)

  •  ?


<bliu1>

  • To evaluate our analysis on tweets, we can pick a random subset of users, and compare our inferred user info with human-interpreted information.
  • And to evaluate the consistency of our conclusions, we may use cross-validation.

(We use a subset to learn the nature of tweeting behavior, and check if the conclusions can also be applied to the rest of users.)

</bliu1>

What are the key technical challenges, and what do you hope to learn?

  • It's probably gonna take some time to figure out how to get the 500 "active location users"


<bliu1>

  • One of the challenges here is to infer users' behavior from tweet stream, which is a noisy channel.

Users might not have consistent tweeting style, or behave differently in different sources (for example, web browser V.S. mobile clients)

  • Another challenge is the analysis of social interaction of users. The interaction flow that we can observe might be incomplete.

What kinds of social impact can we identify? We might need to dig deeper into our data to find the answer.

</bliu1>

Related papers

  •  ?