What VS What? Detect Controversial Topics in Online Community

Team members

User:Yuchen Tian

Teammate Wanted! Feel Free to contact me!

Motivation

In online communities, there are always some topics that are more controversial than others and attract a lot of users' enthusiasm and concentration. For example, in Geek news communities such as Slashdot, the news article about Apple VS Android topic usually has a much higher volume of comments. The same thing happens when things comming to other controversial topics like Windows VS Linux, Open Source VS Commercial Software.

The goal behind this project is to automatic discover those topics inside a online community that when put get together, the level of controversy grows higher.

Project idea

When given series of Documents d and the number of comments associated with that Documents, note as $N(d)$

${(d_{i},N(d_{i})),....,(d_{i},N(d_{i}))}$ $d\in D$

By running Topic Model like LDA on Document Space D, we can get k topics, noted as: $(t_{1},t_{2},....,t_{k})$

Given a particular document $d_{i}$ , in LDA, it has a representation in the topic space, as $w_{d}=(w_{1d}*t_{1},w_{2d}*t_{2}...,w_{kd}*t_{k})$

Then we get the number of comments that a particular topic can generate:

$N(t_{i})={\frac {\sum _{d\in D}w_{id}*N(d)}{\sum _{d\in D}w_{id}}}$

By using some sentiment analysis techniques, we hope to detect the sentiment towards a topic given a document. Specifically given a topic $t_{i}$ , we hope to find those documents that hold a positive sentiment to this topic, define as $D_{t_{i}+}$ . Thus we can calculate the number of comments a topic can generate when the sentiment in such document is positive:

$N(t_{i+})={\frac {\sum _{d\in D_{t_{i}+}}w_{id}*N(d)}{\sum _{d\in D_{t_{i}+}}w_{id}}}$

Then we can define the degree of controverse between two topic as follows:

$con(t_{1},t_{2})={\frac {N(t_{1+},t_{2-})}{\sqrt {N(t_{1+})N(t_{2-})}}}$

Dataset

We plan to crawl data from some online tech new communities, such as slashdot, theverge and engadget. For each blog, we get the content of the article and the comments associated with that article.

There are existing datasets we can use like the political blogs, which have blog content and comments described in one of the paper in reference paper.

Reference

Ramnath Balasubramanyan et. al. ICWSM 2012 This one has the political blog datasets
Roja Bandari et. al. ICWSM 2012 The Pulse of News in Social Media: Forecasting Popularity, ICWSM 2012
Shmueli et. al. WWW2012 Care to Comment? Recommendations for Commenting on News Stories

What VS What? Detect Controversial Topics in Online Community

Contents

Team members

Motivation

Project idea

Dataset

Reference

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools