Difference between revisions of "Zhao et al, AAAI 07"

From Cohen Courses
Jump to navigationJump to search
m
m
Line 11: Line 11:
 
== Summary ==
 
== Summary ==
  
They experimented with the [[UsesDataset::Topic Detection and Tracking]] corpus.
+
The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.
 +
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.
 +
First, the authors did content based clustering using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.
 +
This clustering segments their data into topics.
 +
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.
 +
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.
 +
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.
 +
 
 +
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.
 +
 
 +
They used the Enron Email dataset and Dailykos blogs. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news. a
  
 
== Evaluation ==
 
== Evaluation ==

Revision as of 23:14, 30 September 2012

This Paper is relevant to our project on detecting controversial events in Twitter.

Citation

Qiankun Zhao, Prasenjit Mitra, and Bi Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007.

Online version

Temporal and information flow based event detection from social text streams

Summary

The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data. Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors. First, the authors did content based clustering using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm. This clustering segments their data into topics. For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model. With the temporal segmentation, each topic is represented as a sequence of social network graphs over time. The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.

With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.

They used the Enron Email dataset and Dailykos blogs. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news. a

Evaluation

Discussion

Related papers

There has been a lot of work on event detection.

Study plan

  • Article: Group average agglomerative clustering [1]
  • Article: Single pass clustering [2]