Project Anuj Dani
Taming Tweets Overload: Personalized Ranking of Tweets (Project Proposal)
Contents
Team Members
Anuj Goyal [anuj@cs.cmu.edu]
Dani Yogatama [dyogatama@cs.cmu.edu]
Sriram Somanchi (somanchi@cmu.edu)
Project Idea
Twitter is becoming a popular social media service which can be used to share and retrieve information about latest events. Since there is no restriction on the number of users one can follow, users tend to follow more and more people. As a result, they receive overwhelming number of tweets, to the extent that it is sometimes troublesome to go through every new tweet in their pages. In this project, we propose a solution to automatically rank tweets which are posted in a fixed period of time. Our personalized relevancy algorithm will be based on the following factors:
1. User's interests, which can be estimated from previously posted or re-tweeted content
2. Neighbors' preference, i.e., like-dislike of other users (neighbors in the friend network) whose interests match the interests of the user under consideration
3. Similarity between new tweet and older tweets, computed using topic similarity and weighted by the number of users who like those tweets.
Related Work
Our work borrow ideas from three different fields: Social Media, Topic Modeling and Collaborative Recommendation. This section gives brief overview of these three fields.
To the best of our knowledge, there has not been any published work in personalized ranking of tweets. There are, however, several works which addressed a closely related problem of recommending new friends in social media. [5] proposed a method to recommend friends using user-specific features such as popularity(# followees / # followers) and activity (# of tweets posted on Twitter), while [15] did this by means of clustering email-exchange network graph.
Other related works include predicting the strength of social ties and exploiting social relation for recommendation. [7] used demographic features (age, religion, etc.) and interaction-based features (# comments left on friend’s photo, etc.) to predict the strength of connection between a user and his (her) Facebook friends. [16] tried to predict social link based on semantic similarity measures of tags (# common tags) used by users to annotate images. Other works such as [20] try to exploit social relation for recommendation. This work calculates user similarity based on the friends network instead of based on their ratings to different items. [10] tries to recommend music tracks by exploiting social relation between users, tags given to tracks by users and tags received by tracks from users. In this work authors create a graph of nodes (user, tag and tracks) and apply random walk on it.
In topic modeling to model document network data, [19] proposed relation topic models for document networks, [19] proposed a multi-relational topic model for social recommendation, and [14] characterized microblogs by using Labeled LDA.
In collaborative recommendation, in general, there are three approaches: collaborative filtering, content-based recommendation, and hybrid recommendation. In collaborative filtering [12], the distance between two users is calculated based on their ratings to a set of items, and the missing rating is predicted by aggregating the ratings of the k nearest neighbors of the user. In content based recommendation [12], items are selected based on correlation between its content (keywords, artists, genre etc) and content of the items user prefered in the past. Hybrid recommendation [13] is the combination of the above two approaches.
Note that our approach is different from most recommendation systems which use meta-data (keywords, tags, etc.) or attributes of items since it also uses text data (tweets).
Research Questions
There are several research questions that need to be answered in this project :
1. How to effectively model users relation and user content together. We could model content of each user independently or use relational topic models to jointly model relation and content together.
2. What will happen when new user join Twitter. We will not have any data to calculate topical similarity of this user with other users. A simple solution is to initially calculate similarity between two users based on who they follow (their following patterns). We might even not need to handle this new user issue since a new user will initially have few followees and thus does not need tweet ranking. As the time goes on, we will acquire more data from him, so we can recommend accordingly.
3. How to calculate content-based similarity between two tweets.
4. We can not keep all tweets in the subgraph, so we have to remove tweets which has been used in calculating topic distribution of a user from the subgraph. How to balance these two kinds of tweets.
5. How to utilize personal tweets to weight edges among sender-receiver.
6. As new tweets get re-tweeted, their ranking will get higher in user's (who haven't seen this tweet yet) feed based on the number of re-tweets and user's relation to others who re-tweeted it. We may also need to consider time duration. For example, a tweet which received thirty re-tweets in an hour should be ranked higher than a tweet with fifty re-tweets in two days. how to represent this in our model.
7. How to make a model fairly flexible, such that it also recommends tweets from a new topic / domain which has not previously been seen.
Approach Overview
Our method is a random walk algorithm on a graph with two types of nodes (users and tweets) and three types of (weighted) edges (user-user, user-tweet, and tweet-tweet). Specificially, the method generates a subgraph for each user and performs random-walk to rank different tweets. Suppose that user U follows n other users (followees) (e_1, e_2 ... e_n) and these followees are followed by other followers (l_11, l_12,..., l_21, l_22...l_n1, l_n2 ....). The subgraph is constructed as follows :
1. Create nodes for U, U's followees and all followers of U's followees in the subgraph of user U.
2. Create nodes for tweets of all the followees.
3. Create edges between all users in the subgraph. The weights will depend on topical similarity of the users (i.e., topical similarity between their tweets). Note that at first, there will only be edges between followee and their tweets. Later, when a follower retweets a tweet, an edge is drawn from that follower to the tweet.
4. Create edges between different tweets of a followee whose weights are determined based on the content similarity of tweets.
The proposed algorithm uses content-based recommendation for a new tweet and add more information similar to collaborative-filtering as time goes by.
Consider the following example. Suppose that a followee e_1 posts a new tweet (t_new). At first, U gets its ranked based on the U-e_1 edge weight and weights of t_old-t_new edges, where t_old are tweets re-tweeted by U in the past. When other followers of e_1 (e.g., l_11, l_12) re-tweet t_new, its ranking for user U changes based on the U-l_11 and U-l_12 edge weights.
Evaluation
We plan to harvest tweets and network data of fairly active users who tweet and re-tweet regularly. We will partition these tweets into two sets : training and testing. To evaluate the method, we will apply our method to rank tweets in the test set, and compute its accuracy with respect to re-tweeted tweets. In other words, we will compare the number of re-tweeted tweets in the test data that are ranked higher to the number of non re-tweeted tweets.
References
[1] K. R. Canini, B. Suh, and P. Pirolli. Finding Relevant Sources in Twitter Based on Content and Social Structure.
[2] Jonathan Chang and David Blei. Relational topic models for document networks. In AIStats, 2009.
[3] Ilham Esslimani, Armelle Brun, and Anne Boyer. From social networks to behavioral networks in recommender systems. Social Network Analysis and Mining, International Conference on Advances in, 0:143–148, 2009.
[4] Franois Fouss, Alain Pirotte, Jean michel Renders, and Marco Saerens. Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19:2007, 2006.
[5] Ruth Garcia and Xavier Amatriain. Weighted content based methods for recommending connections in online social networks. In Proceedings of the 2nd ACM RecSys10 Workshop on Recommender Systems and the Social Web, RecSys’10, 2010.
[6] Daniel Gayo-Avello. Nepotistic relationships in twitter and their impact on rank prestige algorithms. CoRR, abs/1004.0816, 2010.
[7] Eric Gilbert and Karrie Karahalios. Predicting tie strength with social media. In In Proceedings of the Conferece on Human Factors in Computing Systems (CHI09, page 220, 2009.
[8] Zan Huang, Wingyan Chung, Thian-Huat Ong, and Hsinchun Chen. A graph-based recommender system for digital library. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, JCDL ’02, pages 65–73, New York, NY, USA, 2002. ACM.
[9] Indika Kah and Jennifer Neville. Using transactional information to predict link strength in online social networks.
[10] Ioannis Konstas, Vassilios Stathopoulos, and Joemon M. Jose. On social networks and collaborative recommendation. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’09, pages 195–202, New York, NY, USA, 2009. ACM.
[11] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 591–600, New York, NY, USA, 2010. ACM.
[12] Fabin Lousame and Eduardo Snchez. A taxonomy of collaborative-based recommender systems. In Giovanna Castellano, Lakhmi Jain, and Anna Fanelli, editors, Web Personalization in Intelligent Environments, volume 229 of Studies in Computational Intelligence, pages 81–117. Springer Berlin / Heidelberg, 2009.
[13] Nguyen Phuong, Le Thang, and Tu Phuong. A graph-based method for combining collaborative and content-based filtering. In Tu-Bao Ho and Zhi-Hua Zhou, editors, PRICAI 2008: Trends in Artificial Intelligence, volume 5351 of Lecture Notes in Computer Science, pages 859–869. Springer Berlin / Heidelberg, 2008.
[14] Daniel Ramage, Susan Dumais, and Dan Liebling. Characterizing Microblogs with Topic Models. In ICWSM, 2010.
[15] Maayan Roth, Assaf Ben-David, David Deutscher, Guy Flysher, Ilan Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, and Ron Merom. Suggesting friends using the implicit social graph. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, pages 233–242, New York, NY, USA, 2010. ACM.
[16] Rossano Schifanella, Alain Barrat, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. Folks in folksonomies: Social link prediction from shared metadata. CoRR, abs/1003.2281, 2010.
[17] Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. Twitterrank: finding topic-sensitive influential twitterers. In Proceedings of the third ACM international conference on Web search and data mining, WSDM ’10, pages 261–270, New York, NY, USA, 2010. ACM.
[18] Rongjing Xiang, Jennifer Neville, and Monica Rogati. Modeling relationship strength in online social networks. In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 981–990, New York, NY, USA, 2010. ACM.
[19] Lei Zhang, Jun Wu, Zhong-Cun Wang, and Chong-Jun Wang. Multi-relational topic model for social recommendation. In Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on, volume 2, pages 349 –350, oct 2010.
[20] Rong Zheng, Foster Provost, and Anindya Ghose. Social network collaborative filtering.