Project Second Draft-Subhodeep Manaj

From Cohen Courses
Revision as of 18:01, 15 February 2011 by Subhodee (talk | contribs)
Jump to navigationJump to search

Project Proposal

Predicting proportion of users that like a Youtube video through the comments on the blog

Team Members

Subhodeep Moitra (smoitra@cs.cmu.edu) Manaj Srivastava (manajs@cs.cmu.edu)

Goal of the Project

We aim at modeling and estimating the bias groups among the users who make comments on blogs. For any blog, the users making comments either agree or disagree with the opinions of the author or of other users making comments. Also, these agreements and disagreements could be on various sub-topics discussed within a single blog. We aim at estimating which users are agreeing or disagreeing on what sub-topics of a given blog. We have gone through few papers which tackle different aspects of this problem separately. Hu et. al. [1] did extraction based summarization of sentences from blog-posts based on the content of the comments. Such an attempt is useful for us, so that we can relate the discussions in the comments with the sub-topics in the blog-posts. Another interesting work by Mishne and Glance [2] aims at detecting disputes in comments to web-blogs, which again relates to what we attempt to do. Another paper by Schuth et. al. [3] aims at finding the comments which relate to one thread of discussion. This is particularly useful in cases where the users cannot reply to other users’ comments explicitly. The techniques used in this paper could be useful in our case, to find out the likely discussion thread among all the posts on a certain blog.


Data Set

We will scrape youtube using an API so as to extract comments and other metadata such as number of likes, related video titles and number of views for a predefined genre of videos such as "music videos"

Evaluation Metric

Our evaluation metric will be the number of likes and dislikes for a particular video.

Filtering junk comments

An important part of our approach will be preprocessing the set of comments so as to filter out comments that are not relevant to the topic. A number of users also post spam comments such as links to their websites. We plan to incorporate a model that can classify comments as spam and reject them.



References

1] Hu M., Sun A., Lim E., “Comments-Oriented Blog Summarization by Sentence Extraction”, 16th ACM Conference on Information and Knowledge Management, 2007

[2] Mishne G., Glance N., “Leave a Reply: An Analysis of Weblog Comments”, Third Annual Workshop on the Web-logging Ecosystem, 2006 [3] Schuth A., Marx M., Rijke M., “Extracting the discussion structure in comments on news-articles”, Proceedings of the 9th Annual ACM Workshop on Web-Information and Data Management, 2007