KBP at TAC

From Cohen Courses
Revision as of 00:54, 30 September 2011 by Wpang (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Knowledge Base Population (KBP) is a evaluation track at Text Analysis Conference (TAC), with the goal of explore extraction ability of automated systems to discover information about named entities with reference to an external knowledge source.

The evaluation will provide an initial (or reference) knowledge base, along with a document collection that systems are to use to learn from. Attributes (a.k.a., “slots”) derived from Wikipedia infoboxes are used to create the reference knowledge base, as the evaluation dataset.

Task Description

Task Description for Knowledge-Base Population at TAC 2011

Entity Linking

In the Entity Linking task, given a query that consists of a name string and a background document ID, the system is required to provide the ID of the Knowledge Base entry to which the name align to, for example, your system is given a query like:

<query id="EL000304">
 <name>Barnhill</name>
 <docid>eng-NG-31-100578-11879229</docid>
</query>

And your system should refer to the correct name entity of Barnhill in knowledge base.

Slot Filling

The goal of Slot Filling is to collect from the corpus information regarding certain attributes of an entity, which may be a person or some type of organization. It can be viewed as more traditional Information Extraction (IE), or alternatively, as a Question Answering (QA) task, where the questions are static but the targets change. Required slots can be single-valued (e.g., per:date_of_birth) or list-valued (e.g., per:employee_of, per:children).

Example query:

<query id="SF114">
 <name>Masi Oka</name>
 <docid>eng-WL-11-174592-12943233</docid>
 <enttype>PER</enttype>
 <nodeid>E0300113</nodeid>
 <ignore>per:date_of_birth per:age per:country_of_birth per:city_of_birth</ignore>
</query>

Data

The reference knowledge base includes hundreds of thousands of entities based on articles from an October 2008 dump of English Wikipedia which includes 818,741 nodes. Each node will be assigned to a entity type of PER, ORG, GPE, or UKN (unknown)

The document source collection includes 17 Broadcast Conversation, 665 Broadcast News, 1 Conversation Telephone Speech, 1,286,609 Newswire, and 490,596 Web Text.

Relevant Papers