Difference between revisions of "Understanding user information sharing behaviors"
Line 14: | Line 14: | ||
* Measuring the relationship between the measures using their correlation coefficients, e.g, Pearson correlation | * Measuring the relationship between the measures using their correlation coefficients, e.g, Pearson correlation | ||
− | == | + | == Dataset == |
The dataset has been crawled from Twitter using Twitter's APIs. Starting from a set of selected influential users, e.g., politician (barrackobama, mittromney, etc) and political bloggers (dailykos, ameriablog, etc) , we expanded the set by including their followers and followees. Tweets, including retweets, from all the users then will be crawled in daily basis. Roughly, we have 400K users, 7M tweets (about one-fifth are retweets) per month. | The dataset has been crawled from Twitter using Twitter's APIs. Starting from a set of selected influential users, e.g., politician (barrackobama, mittromney, etc) and political bloggers (dailykos, ameriablog, etc) , we expanded the set by including their followers and followees. Tweets, including retweets, from all the users then will be crawled in daily basis. Roughly, we have 400K users, 7M tweets (about one-fifth are retweets) per month. |
Revision as of 15:24, 8 October 2012
This is an assignment project for Social Media Analysis course in Fall 2012
Contents
Introduction
We want to characterizing user behaviors in retweeting in Twitter. The objective of this project is to answer the question that, in Twitter, who often retweet from whom, about what topics, and with which degree of sentiment.
Team Members
Goal
Identifying the relationship between likelihood that a user retweets a tweet with some other network and linguistic factors, e.g., the relative position of the follower and the followee in the network (they are in the same political community or not), their centralities within the network and their communities, or the topic of the tweets, etc.
Methods
- Using bag-of-words based classifiers to identify user community label
- Measuring user relative position in network using their community label and network centralities
- Measuring the relationship between the measures using their correlation coefficients, e.g, Pearson correlation
Dataset
The dataset has been crawled from Twitter using Twitter's APIs. Starting from a set of selected influential users, e.g., politician (barrackobama, mittromney, etc) and political bloggers (dailykos, ameriablog, etc) , we expanded the set by including their followers and followees. Tweets, including retweets, from all the users then will be crawled in daily basis. Roughly, we have 400K users, 7M tweets (about one-fifth are retweets) per month.