Difference between revisions of "Topic Detection and Tracking"

From Cohen Courses
Jump to navigationJump to search
m
m
 
Line 3: Line 3:
 
Annotation guidelines are available [http://projects.ldc.upenn.edu/TDT5/Annotation/TDT2004V1.2.pdf here].
 
Annotation guidelines are available [http://projects.ldc.upenn.edu/TDT5/Annotation/TDT2004V1.2.pdf here].
  
This dataset contains 407, 505 news articles in Arabic, Mandarin and English. The news articles are annotated for topics, events and activities.
+
This dataset contains 407,505 news articles in Arabic, Mandarin and English. The news articles are annotated for topics, events and activities.
  
 
From the annotation guidelines:  
 
From the annotation guidelines:  

Latest revision as of 20:20, 26 September 2012

This dataset is used for the Topic Detection and Tracking task hosted by NIST [1].

Annotation guidelines are available here.

This dataset contains 407,505 news articles in Arabic, Mandarin and English. The news articles are annotated for topics, events and activities.

From the annotation guidelines:

A TDT event is defined as a particular thing that happens at a specific time and place, along with all necessary preconditions and unavoidable consequences. A TDT event might be a particular plane crash, or a single meeting, or a particular court hearing. An activity is a connected set of events that have a common focus or purpose, happening at a specific place and time; for instance, a campaign, or an investigation, or a disaster relief effort. For the purposes of TDT, a topic is defined as an event or activity, along with all directly related events and activities.

Relevant Papers