What VS What? Detect Controversial Topics in Online Community

From Cohen Courses
Revision as of 01:03, 9 October 2012 by Yuchent (talk | contribs) (→‎Data Set)
Jump to navigationJump to search

Team members

Teammate Wanted! Feel Free to contact me!

Motivation

In online communities, there are always some topics that are more controversial than others and attract a lot of users' enthusiasm and concentration. For example, in Geek news communities such as Slashdot, the news article about Apple VS Android topic usually has a much higher volume of comments. The same thing happens when things comming to other controversial topics like Windows VS Linux, Open Source VS Commercial Software.

The goal behind this project is to automatic discover those topics inside a online community that when put get together, the level of controversy grows higher.

Project idea

When given series of Documents d and the number of comments associated with that Documents, note as

By running Topic Model like LDA on Document Space D, we can get k topics, noted as:

Given a particular document , in LDA, it has a representation in the topic space, as

Then we get the number of comments that a particular topic can generate:

By using some sentiment analysis techniques, we hope to detect the sentiment towards a topic given a document. Specifically given a topic , we hope to find those documents that hold a positive sentiment to this topic, define as . Thus we can calculate the number of comments a topic can generate when the sentiment in such document is positive:

Then we can define the degree of controverse between two topic as follows:

Dataset

We plan to crawl data from some online tech new communities, such as slashdot, theverge and engadget. For each blog, we get the content of the article and the comments associated with that article.

There are existing datasets we can use like the political blogs, which have blog content and comments described in one of the paper in reference paper.

Reference