Difference between revisions of "Understanding user information sharing behaviors"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'This is an assignment project for [http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Fall_2012 Social Media Analysis course in Fall 2012] == Project Proposal ==…')
 
 
(11 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
This is an assignment project for [http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Fall_2012 Social Media Analysis course in Fall 2012]
 
This is an assignment project for [http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Fall_2012 Social Media Analysis course in Fall 2012]
== Project Proposal ==
+
== Comments ===
Characterizing user behaviors in retweeting in Twitter
+
 
 +
Tuan, let's keep talking about this in our one-one meetings.  This is a good chance to do something you might not otherwise get to in on our research project, I'm glad you're looking at this data for the course project also.  --[[User:Wcohen|Wcohen]] 14:49, 10 October 2012 (UTC)
 +
 
 +
 
 +
== Introduction ==
 +
We want to characterizing user behaviors in retweeting in Twitter. The objective of this project is to answer the question that, in Twitter, who often retweet from whom, and about what topics.
  
 
== Team Members ==
 
== Team Members ==
Line 7: Line 12:
  
 
== Goal ==
 
== Goal ==
Identifying the relationship between likelihood that a user retweets a tweet with some other network and linguistic factors, e.g., the  
+
Identifying the relationship between likelihood that a user retweets a tweet with some other network and linguistic factors, e.g., the relative position of the follower and the followee in the network (they are in the same political community or not), their centralities within the network and their communities, or the topic of the tweets, etc.
  
 
== Methods ==
 
== Methods ==
 
* Using bag-of-words based classifiers to identify user community label
 
* Using bag-of-words based classifiers to identify user community label
* Measuring user relative position in network using their community label and other
+
* Using LDA based models to identify topics of the tweets
*
+
* Measuring user relative position in network using their community label and network centralities
 +
* Measuring the relationship between the measures using their correlation coefficients, e.g, Pearson correlation
  
== Data Set ==
+
== Dataset ==
The dataset is crawled from Twitter using Twitter's APIs. Starting from a set of selected influential users, we expanded the set by including their followers and followees. Tweets, including retweets, from all the users then will be crawled in daily basis.
+
The dataset has been crawled from Twitter using Twitter's APIs. Starting from a set of selected influential users, e.g., politician (barrackobama, mittromney, etc) and political bloggers (dailykos, ameriablog, etc) , we expanded the set by including their followers and followees. Tweets, including retweets, from all the users then will be crawled in daily basis. Roughly, we have 400K users, 7M tweets (about one-fifth are retweets) per month.

Latest revision as of 09:49, 10 October 2012

This is an assignment project for Social Media Analysis course in Fall 2012

Comments =

Tuan, let's keep talking about this in our one-one meetings. This is a good chance to do something you might not otherwise get to in on our research project, I'm glad you're looking at this data for the course project also. --Wcohen 14:49, 10 October 2012 (UTC)


Introduction

We want to characterizing user behaviors in retweeting in Twitter. The objective of this project is to answer the question that, in Twitter, who often retweet from whom, and about what topics.

Team Members

Tuan Anh

Goal

Identifying the relationship between likelihood that a user retweets a tweet with some other network and linguistic factors, e.g., the relative position of the follower and the followee in the network (they are in the same political community or not), their centralities within the network and their communities, or the topic of the tweets, etc.

Methods

  • Using bag-of-words based classifiers to identify user community label
  • Using LDA based models to identify topics of the tweets
  • Measuring user relative position in network using their community label and network centralities
  • Measuring the relationship between the measures using their correlation coefficients, e.g, Pearson correlation

Dataset

The dataset has been crawled from Twitter using Twitter's APIs. Starting from a set of selected influential users, e.g., politician (barrackobama, mittromney, etc) and political bloggers (dailykos, ameriablog, etc) , we expanded the set by including their followers and followees. Tweets, including retweets, from all the users then will be crawled in daily basis. Roughly, we have 400K users, 7M tweets (about one-fifth are retweets) per month.