Difference between revisions of "Controversial events detection"

From Cohen Courses
Jump to navigationJump to search
Line 48: Line 48:
#. Draw <math>E</math> multinomials, <math>\phi_e</math> from a prior (Dirichlet or logistic normal prior), one for each event <math>e</math>.
#. Draw <math>E</math> multinomials, <math>\phi_e</math> from a prior (Dirichlet or logistic normal prior), one for each event <math>e</math>.
#. haha
#. haha

Revision as of 00:16, 16 October 2012


This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.

One suggestion would be to look at a topic-modeling approach, eg topics over time, to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my MCR-LDA model. So one way to flesh out this idea would be to start with two topic models:

  • MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw.
  • TOT, to detect shortlived 'events' vs long-term topics.

Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."

These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --Wcohen 14:33, 10 October 2012 (UTC)

PS. There is also a one-person team working on similar topic, you all should talk - it's User:Yuchen Tian --Wcohen 18:40, 10 October 2012 (UTC)

Team members

Project idea

In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media. For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years. Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous. In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).

Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.

We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.

Formalizing the task

Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.

We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct factions.

Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.

Possible model

Here's a sketch of a topic model that we are considering for our task. It is a variant of a topic model, where each word is assumed to be jointly generated by an event and faction. It is also similar to the topic over time model, where we generate the time stamps for each document.

A generative story is as follows:

  1. . Draw multinomials, from a prior (Dirichlet or logistic normal prior), one for each event .
  2. . haha


Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc. In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.

Related work

Related materials