Comparison Das et al WSDM 2011 and Zhao et al AAAI 2007

From Cohen Courses
Revision as of 22:40, 5 November 2012 by Ysim (talk | contribs)
Jump to navigationJump to search

This is a comparison of two related papers in event detection and temporal information extraction.


The papers are

Comparative analysis of both papers

On a high level, both papers are interested in discovering events from large amount temporal information sources. Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the Enron email corpus and Dailykos blogs.

In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods. After which, the next step is to discover the relationships between such entities. This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period. Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the ``social aspect, where an event is being talked about more than usual by ``social actors.

Method-wise, both papers framed the problem of identifying relationships in the context of graphs. In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them. In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.

In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on. The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with. Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.

Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.

Related papers