Difference between revisions of "KBP at TAC"

From Cohen Courses
Jump to navigationJump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Knowledge Base Population (KBP) is a evaluation track at [http://www.nist.gov/tac/ Text Analysis Conference (TAC)], with the goal of explore extraction ability of automated systems to discover information about named entities with reference to an external knowledge source.
 
Knowledge Base Population (KBP) is a evaluation track at [http://www.nist.gov/tac/ Text Analysis Conference (TAC)], with the goal of explore extraction ability of automated systems to discover information about named entities with reference to an external knowledge source.
  
Using basic schema for persons, organizations, and locations, nodes in an ontology must be created and populated using unstructured information found in text. A collection of [[Wikipedia]] Infoboxes will serve as a rudimentary initial knowledge representation, as the evaluation [[Category::dataset]].
+
The evaluation will provide an initial (or reference) knowledge base, along with a document collection that systems are to use to learn from. Attributes (a.k.a., “slots”) derived from [[Wikipedia]] infoboxes are used to create the reference knowledge base, as the evaluation [[Category::dataset]].
  
 
== Task Description ==
 
== Task Description ==
 +
 +
[http://nlp.cs.qc.cuny.edu/kbp/2011/KBP2011_TaskDefinition.pdf Task Description for Knowledge-Base Population at TAC 2011]
 +
 
=== Entity Linking ===  
 
=== Entity Linking ===  
Align names to entities in the knowledge.
+
 
 +
In the Entity Linking task, given a query that consists of a name string and a background document ID, the system is required to provide the ID of the Knowledge Base entry to which the name align to, for example, your system is given a query like:
 +
 
 +
<query id="EL000304">
 +
  <name>Barnhill</name>
 +
  <docid>eng-NG-31-100578-11879229</docid>
 +
</query>
 +
 
 +
And your system should refer to the correct name entity of Barnhill in knowledge base.
  
 
=== Slot Filling ===  
 
=== Slot Filling ===  
Mine information about entities from free text.  
+
The goal of Slot Filling is to collect from the corpus information regarding certain attributes of an entity, which may be a '''person''' or some type of '''organization'''. It can be viewed as more traditional Information Extraction (IE), or alternatively, as a Question Answering (QA) task, where the questions are static but the targets change. Required slots can be '''single-valued''' (e.g., ''per:date_of_birth'') or '''list-valued''' (e.g., ''per:employee_of'', ''per:children'').
 +
 
 +
Example query:
 +
 
 +
<query id="SF114">
 +
  <name>Masi Oka</name>
 +
  <docid>eng-WL-11-174592-12943233</docid>
 +
  <enttype>PER</enttype>
 +
  <nodeid>E0300113</nodeid>
 +
  <ignore>per:date_of_birth per:age per:country_of_birth per:city_of_birth</ignore>
 +
</query>
 +
 
 +
== Data ==
 +
The reference knowledge base includes hundreds of thousands of entities based on articles from an October 2008 dump of English [[Wikipedia]] which includes 818,741 nodes. Each node will be assigned to a entity type of PER, ORG, GPE, or UKN (unknown)
 +
 
 +
The document source collection includes 17 Broadcast Conversation, 665 Broadcast News, 1 Conversation Telephone Speech, 1,286,609 Newswire, and 490,596 Web Text.  
  
 
== Relevant Papers ==
 
== Relevant Papers ==

Latest revision as of 00:54, 30 September 2011

Knowledge Base Population (KBP) is a evaluation track at Text Analysis Conference (TAC), with the goal of explore extraction ability of automated systems to discover information about named entities with reference to an external knowledge source.

The evaluation will provide an initial (or reference) knowledge base, along with a document collection that systems are to use to learn from. Attributes (a.k.a., “slots”) derived from Wikipedia infoboxes are used to create the reference knowledge base, as the evaluation dataset.

Task Description

Task Description for Knowledge-Base Population at TAC 2011

Entity Linking

In the Entity Linking task, given a query that consists of a name string and a background document ID, the system is required to provide the ID of the Knowledge Base entry to which the name align to, for example, your system is given a query like:

<query id="EL000304">
 <name>Barnhill</name>
 <docid>eng-NG-31-100578-11879229</docid>
</query>

And your system should refer to the correct name entity of Barnhill in knowledge base.

Slot Filling

The goal of Slot Filling is to collect from the corpus information regarding certain attributes of an entity, which may be a person or some type of organization. It can be viewed as more traditional Information Extraction (IE), or alternatively, as a Question Answering (QA) task, where the questions are static but the targets change. Required slots can be single-valued (e.g., per:date_of_birth) or list-valued (e.g., per:employee_of, per:children).

Example query:

<query id="SF114">
 <name>Masi Oka</name>
 <docid>eng-WL-11-174592-12943233</docid>
 <enttype>PER</enttype>
 <nodeid>E0300113</nodeid>
 <ignore>per:date_of_birth per:age per:country_of_birth per:city_of_birth</ignore>
</query>

Data

The reference knowledge base includes hundreds of thousands of entities based on articles from an October 2008 dump of English Wikipedia which includes 818,741 nodes. Each node will be assigned to a entity type of PER, ORG, GPE, or UKN (unknown)

The document source collection includes 17 Broadcast Conversation, 665 Broadcast News, 1 Conversation Telephone Speech, 1,286,609 Newswire, and 490,596 Web Text.

Relevant Papers