Difference between revisions of "Popescu and Pennacchiotti, CIKM 10"

From Cohen Courses
Jump to navigationJump to search
m
m
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
This [[Category::Paper]] is relevant to our project on detecting controversial events in Twitter.
+
This [[Category::Paper]] is relevant to our project on [[AddressesProblem::Controversial_events_detection|detecting controversial events]] in Twitter.
  
 
= Detecting controversial events from Twitter =
 
= Detecting controversial events from Twitter =
Line 16: Line 16:
  
 
Events are defined in relation to a target entity, which is an activity with a clear, finite duration in which the target entity plays a key role.
 
Events are defined in relation to a target entity, which is an activity with a clear, finite duration in which the target entity plays a key role.
The event is considered controversial if it provokes a public discussion in which audience members express opposing opinions or disbeliefs. This is in contrast to events which have little reaction or are overwhelmingly positive/negative (i.e high entropy in opinions).
+
The event is considered controversial if it provokes a public discussion in which audience members express opposing opinions or dis-beliefs. This is in contrast to events which have little reaction or are overwhelmingly positive/negative (i.e high entropy in opinions).
  
The authors seek to extract such snapshots of controversial events by modeling the task as a supervised machine learning problem. Each snapshot is represented by a feature vector constructed from Twitter  and other sources (such as news).
+
The authors seek to extract such snapshots of controversial events by modeling the task as a [[UsesMethod::Supervised_learning|supervised learning]] problem. Each snapshot is represented by a feature vector constructed from Twitter  and other sources (such as news).
 
They present several models for comparison, a direct model using regression, a 2-step pipeline model where one detects events and then measure its controversy, and a 2-step blended model where the results of the event detection step are used as an input to the controversy detection regression model.
 
They present several models for comparison, a direct model using regression, a 2-step pipeline model where one detects events and then measure its controversy, and a 2-step blended model where the results of the event detection step are used as an input to the controversy detection regression model.
  
 
They make extensive use of lexicons such as  
 
They make extensive use of lexicons such as  
1. Controversy lexicon derived from Wikipedia's controversial topic list
+
# Controversy lexicon derived from Wikipedia's controversial topic list [http://en.wikipedia.org/wiki/Wikipedia:List_of_controversial_issues]
2. Bad words lexicons [http://urbanoalvarez.es/blog/2008/04/04/bad-words-list/]
+
# [[UsesDataset::Bad words lexicon]] [http://urbanoalvarez.es/blog/2008/04/04/bad-words-list/]
 +
 
 +
They used features to capture an event snapshots' linguistic properties, structural information (graph), intensity of discussion about an entity and distribution of sentiment words in the event.
 +
Furthermore, they also align news articles to snapshot tweets and see how many news articles mentions the target entity significantly.
  
 
== Evaluation ==
 
== Evaluation ==
  
 +
They manually labeled 800 tweets for events. Their data is not released, although they achieved high kappa score (inter annotator agreement) for their labeled data.
 +
They compared their different models by ranking quality and average precision.
 +
The blended model seemed to perform best on their dataset.
 +
 +
Hashtags were one of the most discriminating features for event detection. Coupling tweets with news and external sources were also useful as they help to validate and explain social media reactions.
  
 
== Discussion ==
 
== Discussion ==
  
 +
They present a very simple regression model with extensive set of features.
 +
The idea of incorporating news media articles is great as tweets are generally short and limited in the amount of information they convey.
 +
They also present a method for computing "controversy" scores which may be useful in such tasks.
  
 
== Related papers ==
 
== Related papers ==
Line 38: Line 49:
 
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This paper aims at detecting and classifying social events using Tree kernels.
 
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This paper aims at detecting and classifying social events using Tree kernels.
 
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] The authors develop a general approach to change-point detection that generalize across wide range of application.
 
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] The authors develop a general approach to change-point detection that generalize across wide range of application.
 
== Study plan ==
 
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]
 
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]
 

Latest revision as of 23:06, 30 September 2012

This Paper is relevant to our project on detecting controversial events in Twitter.

Detecting controversial events from Twitter

Citation

Ana-Maria Popescu and Marco Pennacchiotti. Detecting controversial events from Twitter. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10, pages 1873–1876, New York, NY, USA, 2010. ACM.

Online version

Detecting controversial events from Twitter

Summary

This paper addresses the task of identifying controversial events using Twitter as a starting point.

Events are defined in relation to a target entity, which is an activity with a clear, finite duration in which the target entity plays a key role. The event is considered controversial if it provokes a public discussion in which audience members express opposing opinions or dis-beliefs. This is in contrast to events which have little reaction or are overwhelmingly positive/negative (i.e high entropy in opinions).

The authors seek to extract such snapshots of controversial events by modeling the task as a supervised learning problem. Each snapshot is represented by a feature vector constructed from Twitter and other sources (such as news). They present several models for comparison, a direct model using regression, a 2-step pipeline model where one detects events and then measure its controversy, and a 2-step blended model where the results of the event detection step are used as an input to the controversy detection regression model.

They make extensive use of lexicons such as

  1. Controversy lexicon derived from Wikipedia's controversial topic list [1]
  2. Bad words lexicon [2]

They used features to capture an event snapshots' linguistic properties, structural information (graph), intensity of discussion about an entity and distribution of sentiment words in the event. Furthermore, they also align news articles to snapshot tweets and see how many news articles mentions the target entity significantly.

Evaluation

They manually labeled 800 tweets for events. Their data is not released, although they achieved high kappa score (inter annotator agreement) for their labeled data. They compared their different models by ranking quality and average precision. The blended model seemed to perform best on their dataset.

Hashtags were one of the most discriminating features for event detection. Coupling tweets with news and external sources were also useful as they help to validate and explain social media reactions.

Discussion

They present a very simple regression model with extensive set of features. The idea of incorporating news media articles is great as tweets are generally short and limited in the amount of information they convey. They also present a method for computing "controversy" scores which may be useful in such tasks.

Related papers

There has been a lot of work on event detection.