Topic Detection and Tracking

From Cohen Courses
Revision as of 19:20, 26 September 2012 by Ysim (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This dataset is used for the Topic Detection and Tracking task hosted by NIST [1].

Annotation guidelines are available here.

This dataset contains 407,505 news articles in Arabic, Mandarin and English. The news articles are annotated for topics, events and activities.

From the annotation guidelines:

A TDT event is defined as a particular thing that happens at a specific time and place, along with all necessary preconditions and unavoidable consequences. A TDT event might be a particular plane crash, or a single meeting, or a particular court hearing. An activity is a connected set of events that have a common focus or purpose, happening at a specific place and time; for instance, a campaign, or an investigation, or a disaster relief effort. For the purposes of TDT, a topic is defined as an event or activity, along with all directly related events and activities.

Relevant Papers