Difference between revisions of "Comparison Das et al WSDM 2011 and Zhao et al AAAI 2007"

From Cohen Courses
Jump to navigationJump to search
m (Created page with 'Hello World')
 
m
Line 1: Line 1:
Hello World
+
This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].
 +
The papers are
 +
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [Zhao_et_al,_AAAI_07 Temporal and information flow based event detection from social text streams]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]
 +
 
 +
 
 +
 
 +
== Citation ==
 +
 
 +
 
 +
== Online version ==
 +
 
 +
 
 +
 
 +
== Summary ==
 +
 
 +
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.
 +
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.
 +
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.
 +
This clustering segments their data into topics.
 +
 
 +
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.
 +
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.
 +
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.
 +
 
 +
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.
 +
 
 +
== Evaluation ==
 +
 
 +
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.
 +
 
 +
Performance is measured using precision/recall/fscore of how well events are recovered with their model.
 +
 
 +
== Discussion ==
 +
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.
 +
 
 +
== Related papers ==
 +
There has been a lot of work on event detection.
 +
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.
 +
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.
 +
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.
 +
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.
 +
 
 +
== Study plan ==
 +
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]
 +
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]

Revision as of 22:12, 5 November 2012

This is a comparison of two related papers in event detection and temporal information extraction. The papers are

  • Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [Zhao_et_al,_AAAI_07 Temporal and information flow based event detection from social text streams]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [1]


Citation

Online version

Summary

The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data. Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors. First, the authors did content based clustering using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm. This clustering segments their data into topics.

For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model. With the temporal segmentation, each topic is represented as a sequence of social network graphs over time. The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.

With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.

Evaluation

They used the Enron email corpus and Dailykos blogs [2]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.

Performance is measured using precision/recall/fscore of how well events are recovered with their model.

Discussion

They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.

Related papers

There has been a lot of work on event detection.

Study plan

  • Article: Adaptive time series model [3]
  • Graph cut based clustering [4]