Project Anuj Dani Somanchi

From Cohen Courses
Revision as of 22:45, 14 February 2011 by Ssomanch (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Personalized Ranking of Tweets (Project Proposal)

Team Members

Anuj Goyal [anuj@cs.cmu.edu]

Dani Yogatama [dyogatama@cs.cmu.edu]

Sriram Somanchi [somanchi@cmu.edu]

Project Idea

Twitter is becoming a popular social media service which can be used to share and retrieve information about latest events. Since there is no restriction on the number of users one can follow, users tend to follow more and more people over time. As a result, they receive overwhelming number of tweets, to the extent that it is often troublesome to go through every new tweet in their pages. In this project, we propose a solution to automatically rank tweets which are posted in a fixed period of time. Our personalized relevancy algorithm will be based on the following factors:

1. User's interests, which can be estimated from previously posted or re-tweeted content

2. Neighbors' preference, i.e., like-dislike of other users (neighbors in the friend network) whose interests match the interests of the user under consideration

3. Similarity between new tweet and older tweets, computed using tf-idf or topic similarity and weighted by the number of users who like those tweets.

4. Learning the graph structure between the users based on similarity between re-tweets and then run the personalized relevancy algorithm on the learnt graph.

Research Questions

There are several research questions that need to be answered in this project :

1. How to effectively model users relation and user content together. We could model content of each user independently or use relational topic models to jointly model relation and content together.

2. What will happen when new user join Twitter. We will not have any data to calculate similarity of this user with other users. A simple solution is to initially calculate similarity between two users based on network information (their following patterns). However, we might not need to deal with this issue since a new user will initially have few followees and thus does not need tweet ranking. As the time goes on, we will acquire more data from him, so we can recommend accordingly.

3. How to calculate content-based similarity between two tweets.

4. We can not keep all tweets in the subgraph, so we have to remove tweets which has been used in calculating topic distribution of a user from the subgraph. How to balance these two kinds of tweets.

5. How to utilize personal tweets to weight edges among sender-receiver.

6. As new tweets get re-tweeted, their ranking will get higher in user's (who haven't seen this tweet yet) feed based on the number of re-tweets and user's relation to others who re-tweeted it. We may also need to consider time duration. For example, a tweet which received thirty re-tweets in an hour should be ranked higher than a tweet with fifty re-tweets in two days. how to represent this in our model.

7. How to make a model fairly flexible, such that it also recommends tweets from a new topic / domain which has not previously been seen.

8. What kind of penalty can we introduce that the user is not flooded only one of his topics of interest but has variety in the personalized tweets?

9. How can utilize the re-tweeting patterns to connect users to the users who you may not follow, but have similar behavior as yours? That is, can we learn the graph structure over which most of the tweets get re-tweeted?

10. Can we predict the interesting tweets that a user would like to link to?

Approach Overview

Using the follower/followee graph

Our method is a random walk algorithm on a graph with two types of nodes (users and tweets) and three types of (weighted) edges (user-user, user-tweet, and tweet-tweet). Specificially, the method generates a subgraph for each user and performs random-walk to rank different tweets. Suppose that user U follows n other users (followees) (e_1, e_2 ... e_n) and these followees are followed by other followers (l_11, l_12,..., l_21, l_22...l_n1, l_n2 ....). The subgraph is constructed as follows :

1. Create nodes for U, U's followees and all followers of U's followees in the subgraph of user U.

2. Create nodes for tweets of all the followees.

3. Create edges between all users in the subgraph. The weights will depend on similarity of the users (i.e., similarity between their tweets). Note that at first, there will only be edges between followee and their tweets. Later, when a follower retweets a tweet, an edge is drawn from that follower to the tweet. The similarity can be either cosine similarity or topic similarity.

4. Create edges between different tweets of a followee whose weights are determined based on the content similarity of tweets.

The proposed algorithm uses content-based recommendation for a new tweet and add more information similar to collaborative-filtering as time goes by.

Consider the following example. Suppose that a followee e_1 posts a new tweet (t_new). At first, U gets its ranked based on the U-e_1 edge weight and weights of t_old-t_new edges, where t_old are tweets re-tweeted by U in the past. When other followers of e_1 (e.g., l_11, l_12) re-tweet t_new, its ranking for user U changes based on the U-l_11 and U-l_12 edge weights.

Learning the graph structure

One of the things we want to do is not to actually start with the follower/followee graph but to construct the graph based on the tweeting patterns. The basic idea is to formulate a graph G(V, E) where an edge is formed between a pair of users who tweet more than a threshold number of ‘similar’ tweets on a given topic, irrespective of their followee. Also, we want to explore the latent social network inference methods [23] based on network diffusion, which is diffusion of similar tweets on the network. We want to rank the tweets by running the same procedure on the learnt graph and compare the ranking with baseline model.

Also in order to recommend the tweets that user might be interested in we want to use the learnt graph from the previous step and then use algorithms for link prediction to the nodes which are tweets. Given a graph G(V, E), where V is set of nodes which can be users or tweets, we want to predict the set of candidate tweets () to which a user () would be linked to based on his past history of retweeting. Candidate tweets are decided based on the tweets of all the followees of the user. Recently link recommendation was proposed [24] which is based on supervised random walks to recommend nodes like friends in Facebook. We want to see if this method can be extended to recommend tweets for the users.

Related Work

Our work borrow ideas from three different fields: Social Media, Topic Modeling and Collaborative Recommendation. This section gives brief overview of these three fields.

To the best of our knowledge, there has not been any published work in personalized ranking of tweets. There are, however, several works which addressed a closely related problem of recommending new friends in social media. [6] proposed a method to recommend friends using user-specific features such as popularity(# followees / # followers) and activity (# of tweets posted on Twitter), while [17] did this by means of clustering email-exchange network graph.

Other related works include predicting the strength of social ties and exploiting social relation for recommendation. [8] used demographic features (age, religion, etc.) and interaction-based features (# comments left on friend’s photo, etc.) to predict the strength of connection between a user and his (her) Facebook friends. [18] tried to predict social link based on semantic similarity measures of tags (# common tags) used by users to annotate images. Other works such as [22] try to exploit social relation for recommendation. This work calculates user similarity based on the friends network instead of based on their ratings to different items. [11] tries to recommend music tracks by exploiting social relation between users, tags given to tracks by users and tags received by tracks from users. In this work authors create a graph of nodes (user, tag and tracks) and apply random walk on it.

In topic modeling to model document network data, [21] proposed relation topic models for document networks, [20] proposed a multi-relational topic model for social recommendation, and [16] characterized microblogs by using Labeled LDA.

In collaborative recommendation, in general, there are three approaches: collaborative filtering, content-based recommendation, and hybrid recommendation. In collaborative filtering [12], the distance between two users is calculated based on their ratings to a set of items, and the missing rating is predicted by aggregating the ratings of the k nearest neighbors of the user. In content based recommendation [12], items are selected based on correlation between its content (keywords, artists, genre etc) and content of the items user prefered in the past. Hybrid recommendation [14] is the combination of the above two approaches. Our random-walk collaborative filtering idea is broadly similar to [1]. An efficient method to perform random-walk between documents based on tf-idf similarity is presented in [13].

Our approach is different from most recommendation systems which use meta-data (keywords, tags, etc.) or attributes of items since it also uses text data (tweets).


Evaluation

We plan to harvest tweets and network data of fairly active users who tweet and re-tweet regularly. We will partition these tweets into two sets : training and testing. To evaluate the method, we will apply our method to rank tweets in the test set, and compute its accuracy with respect to re-tweeted tweets. In other words, we will compare the number of re-tweeted tweets in the test data that are ranked higher to the number of non re-tweeted tweets. The baseline method we will compare our method to is a simple td-idf weighting between new tweets and user profile. Note that this baseline does not incorporate social network information at all.

References

[1] Matthew Brand: A Random Walks Perspective on Maximizing Satisfaction and Profit. SDM 2005

[2] K. R. Canini, B. Suh, and P. Pirolli. Finding Relevant Sources in Twitter Based on Content and Social Structure.

[3] Jonathan Chang and David Blei. Relational topic models for document networks. In AIStats, 2009.

[4] Ilham Esslimani, Armelle Brun, and Anne Boyer. From social networks to behavioral networks in recommender systems. Social Network Analysis and Mining, International Conference on Advances in, 0:143–148, 2009.

[5] Franois Fouss, Alain Pirotte, Jean michel Renders, and Marco Saerens. Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19:2007, 2006.

[6] Ruth Garcia and Xavier Amatriain. Weighted content based methods for recommending connections in online social networks. In Proceedings of the 2nd ACM RecSys10 Workshop on Recommender Systems and the Social Web, RecSys’10, 2010.

[7] Daniel Gayo-Avello. Nepotistic relationships in twitter and their impact on rank prestige algorithms. CoRR, abs/1004.0816, 2010.

[8] Eric Gilbert and Karrie Karahalios. Predicting tie strength with social media. In In Proceedings of the Conferece on Human Factors in Computing Systems (CHI09, page 220, 2009.

[9] Zan Huang, Wingyan Chung, Thian-Huat Ong, and Hsinchun Chen. A graph-based recommender system for digital library. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, JCDL ’02, pages 65–73, New York, NY, USA, 2002. ACM.

[10] Indika Kah and Jennifer Neville. Using transactional information to predict link strength in online social networks.

[11] Ioannis Konstas, Vassilios Stathopoulos, and Joemon M. Jose. On social networks and collaborative recommendation. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’09, pages 195–202, New York, NY, USA, 2009. ACM.

[12] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 591–600, New York, NY, USA, 2010. ACM.

[13] Frank Lin and William W. Cohen. A Very Fast Method for Clustering Big Text Dataset. ECAI 10.

[14] Fabin Lousame and Eduardo Snchez. A taxonomy of collaborative-based recommender systems. In Giovanna Castellano, Lakhmi Jain, and Anna Fanelli, editors, Web Personalization in Intelligent Environments, volume 229 of Studies in Computational Intelligence, pages 81–117. Springer Berlin / Heidelberg, 2009.

[15] Nguyen Phuong, Le Thang, and Tu Phuong. A graph-based method for combining collaborative and content-based filtering. In Tu-Bao Ho and Zhi-Hua Zhou, editors, PRICAI 2008: Trends in Artificial Intelligence, volume 5351 of Lecture Notes in Computer Science, pages 859–869. Springer Berlin / Heidelberg, 2008.

[16] Daniel Ramage, Susan Dumais, and Dan Liebling. Characterizing Microblogs with Topic Models. In ICWSM, 2010.

[17] Maayan Roth, Assaf Ben-David, David Deutscher, Guy Flysher, Ilan Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, and Ron Merom. Suggesting friends using the implicit social graph. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, pages 233–242, New York, NY, USA, 2010. ACM.

[18] Rossano Schifanella, Alain Barrat, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. Folks in folksonomies: Social link prediction from shared metadata. CoRR, abs/1003.2281, 2010.

[19] Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. Twitterrank: finding topic-sensitive influential twitterers. In Proceedings of the third ACM international conference on Web search and data mining, WSDM ’10, pages 261–270, New York, NY, USA, 2010. ACM.

[20] Rongjing Xiang, Jennifer Neville, and Monica Rogati. Modeling relationship strength in online social networks. In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 981–990, New York, NY, USA, 2010. ACM.

[21] Lei Zhang, Jun Wu, Zhong-Cun Wang, and Chong-Jun Wang. Multi-relational topic model for social recommendation. In Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on, volume 2, pages 349 –350, oct 2010.

[22] Rong Zheng, Foster Provost, and Anindya Ghose. Social network collaborative filtering.

[23] Myers & Leskovec "On the Convexity of Latent Social Network Inference" In Neural Information Processing Systems (NIPS), 2010

[24] L. Backstrom, J. Leskovec, “Supervised Random Walks: Predicting and Recommending Links in Social Networks” WSDM, 2011