Difference between revisions of "What VS What? Detect Controversial Topics in Online Community"

From Cohen Courses
Jump to navigationJump to search
Line 26: Line 26:
 
<math>N(t_{i}) = \frac{\sum_{d \in D}w_{id}*N(d)}{\sum_{d \in D}w_{id}}</math>
 
<math>N(t_{i}) = \frac{\sum_{d \in D}w_{id}*N(d)}{\sum_{d \in D}w_{id}}</math>
  
By using some sentiment analysis techniques, we hope to detect the sentiment towards a topic given a document. Specifically given a topic <math>t_{i}</math>, we hope to find those documents that hold a positive sentiment to this topic, define as <math>D_{t_{i}+}</math>. Thus we can calculate the number of comments a topic can generate when
+
By using some sentiment analysis techniques, we hope to detect the sentiment towards a topic given a document. Specifically given a topic <math>t_{i}</math>, we hope to find those documents that hold a positive sentiment to this topic, define as <math>D_{t_{i}+}</math>. Thus we can calculate the number of comments a topic can generate when the sentiment in such document is positive:
 +
 
 +
<math>N(t_{i})_{+} = \frac{\sum_{d \in D_{t_{i}+}}w_{id}*N(d)}{\sum_{d \in D_{t_{i}+}}w_{id}}</math>

Revision as of 23:09, 8 October 2012

Team members

Teammate Wanted! Feel Free to contact me!

Motivation

In online communities, there are always some topics that are more controversial than others and attract a lot of users' enthusiasm and concentration. For example, in Geek news communities such as Slashdot, the news article about Apple VS Android topic usually has a much higher volume of comments. The same thing happens when things comming to other controversial topics like Windows VS Linux, Open Source VS Commercial Software.

The goal behind this project is to automatic discover those topics inside a online community that when put get together, the level of controversy grows higher.

Project idea

When given series of Documents d and the number of comments associated with that Documents, note as

By running Topic Model like LDA on Document Space D, we can get k topics, noted as:

Given a particular document , in LDA, it has a representation in the topic space, as

Then we get the number of comments that a particular topic can generate:

By using some sentiment analysis techniques, we hope to detect the sentiment towards a topic given a document. Specifically given a topic , we hope to find those documents that hold a positive sentiment to this topic, define as . Thus we can calculate the number of comments a topic can generate when the sentiment in such document is positive: