Difference between revisions of "What VS What? Detect Controversial Topics in Online Community"

Revision as of 00:39, 9 October 2012

Team members

User:Yuchen Tian

Teammate Wanted! Feel Free to contact me!

Motivation

In online communities, there are always some topics that are more controversial than others and attract a lot of users' enthusiasm and concentration. For example, in Geek news communities such as Slashdot, the news article about Apple VS Android topic usually has a much higher volume of comments. The same thing happens when things comming to other controversial topics like Windows VS Linux, Open Source VS Commercial Software.

The goal behind this project is to automatic discover those topics inside a online community that when put get together, the level of controversy grows higher.

Project idea

When given series of Documents d and the number of comments associated with that Documents, note as $N(d)$

${(d_{i},N(d_{i})),....,(d_{i},N(d_{i}))}$ $d\in D$

By running Topic Model like LDA on Document Space D, we can get k topics, noted as: $(t_{1},t_{2},....,t_{k})$

Given a particular document $d_{i}$ , in LDA, it has a representation in the topic space, as $w_{d}=(w_{1d}*t_{1},w_{2d}*t_{2}...,w_{kd}*t_{k})$

Then we get the number of comments that a particular topic can generate:

$N(t_{i})={\frac {\sum _{d\in D}w_{id}*N(d)}{\sum _{d\in D}w_{id}}}$

By using some sentiment analysis techniques, we hope to detect the sentiment towards a topic given a document. Specifically given a topic $t_{i}$ , we hope to find those documents that hold a positive sentiment to this topic, define as $D_{t_{i}+}$ . Thus we can calculate the number of comments a topic can generate when the sentiment in such document is positive:

$N(t_{i+})={\frac {\sum _{d\in D_{t_{i}+}}w_{id}*N(d)}{\sum _{d\in D_{t_{i}+}}w_{id}}}$

Then we can define the degree of controverse between two topic as follows:

$con(t_{1},t_{2})={\frac {N(t_{1+},t_{2-})}{\sqrt {N(t_{1+})N(t_{2-})}}}$

Data Set

We plan to crawl data from some online tech new communities, such as slashdot, theverge and engadget. For each blog, we get the content of the article and the comments associated with that article.

There are existing datasets we can use like the political blogs, which have blog content and comments described in one of the paper in reference paper.

Reference

[1]

@@ Line 37: / Line 37: @@
 We plan to crawl data from some online tech new communities, such as slashdot, theverge and engadget. For each blog, we get the content of the article and the comments associated with that article.
-There are existing datasets we can use like the political blogs dataset described in
+There are existing datasets we can use like the political blogs, which have blog content and comments described in one of the paper in reference paper.
+== Reference ==
+[http://malt.ml.cmu.edu/mw/index.php/Ramnath_Balasubramanyan_et._al._ICWSM_2012#Detect_comments_sentiment]

Difference between revisions of "What VS What? Detect Controversial Topics in Online Community"

Revision as of 00:39, 9 October 2012

Contents

Team members

Motivation

Project idea

Data Set

Reference

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools