Difference between revisions of "Controversial events detection"

From Cohen Courses
Jump to navigationJump to search
Line 39: Line 39:
  
 
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application
 
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application
 +
 +
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.
  
 
== Related materials ==
 
== Related materials ==

Revision as of 23:46, 8 October 2012

Team members

Project idea

In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media. For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years. Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous. In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).

Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.

We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.

Data

Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc. In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.

Related work

Related materials