Difference between revisions of "Attribute Extraction"

From Cohen Courses
Jump to navigationJump to search
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Summary ==
 
== Summary ==
  
Attribute Extraction is a [[category::problem]] in the field of information extraction that focuses on identifying properties/features that describe a named entity.
+
Attribute Extraction is a [[category::problem]] in the field of information extraction that focuses on identifying properties/features that describe a named entity. Performing attribute extract is often used in disambiguating person names, extracting encylopedic knowledge, and in improving question answering systems.
  
 
== Common Approaches ==
 
== Common Approaches ==
  
 
Some approaches to Attribute Extraction include:
 
Some approaches to Attribute Extraction include:
* Template/Pattern-Learning: Learn template contextual patterns using seed-based bootstrapping
+
* '''Template/Pattern-Learning'''
* Position Based: Basing predictions on absolute and relative ordering of where the attribute values typically appear in documents.
+
** Learn template contextual patterns using seed-based bootstrapping, and assign probability of attribute based on surrounding context. Variations of this method seems to be the predominately used approach in the literature.
* Transitivity-Based: Using transitivity of attributes across co-occuring entities. Co-occuring entities, such as people mentioned in a given person's biography page, tend to have similar attributes.
+
* '''Position Based'''
* Latent-Based: Detect attributes that may not directly be mentioned in an article based on a topic-model.
+
** Basing predictions on absolute and relative ordering of where the attribute values typically appear in documents.
*
+
* '''Transitivity-Based'''
 +
** Using transitivity of attributes across co-occuring entities. Co-occuring entities, such as people mentioned in a given person's biography page, tend to have similar attributes.
 +
* '''Latent-Based'''
 +
** Detect attributes that may not directly be mentioned in an article based on a topic-model.
 +
* '''Two-step: Extract then Verify'''
 +
** First system uses rules, NER, gazetteer based matching, and patterns (manually created or learned) to extract all attribute candidates
 +
** Then verify candidates using a classifier (with features typically based on the context, pattern values, dependency path) to trained determine if attribute value is correct for the given individual or should be discarded
 +
** Sometimes (depending on the application), researchers have opted to filter attribute candidates based on lexical patterns instead of performing classification.
  
== Challenges / Issues ==
+
== Evaluation ==
Some challenges in Attribute Extraction include ...
+
One venue of evaluation for the attribute extraction task has been the Web People Search workshop ([http://nlp.uned.es/weps/index.php WePS: Searching information about entities in the web]), which has had a attribute extraction challenge in its past two workshops: [http://nlp.uned.es/weps/weps2/WePS2_Attribute_Extraction.pdf WePS-2 Attribute Extraction Subtask Guidelines], [http://nlp.uned.es/weps/weps-3/guidelines/42-guidelines-for-the-weps-3-attribute-extraction-subtask WePS-3 Attribute Extraction Subtask Guidelines]
 
 
== References / Links ==
 
* Nikesh Garera and David Yarowsky. '''Structural, Transitive and Latent Models for Biographic Fact Extraction'''. - [http://www.aclweb.org/anthology-new/E/E09/E09-1035.pdf]
 
  
 
== Relevant Papers ==
 
== Relevant Papers ==

Latest revision as of 19:29, 30 November 2010

Summary

Attribute Extraction is a problem in the field of information extraction that focuses on identifying properties/features that describe a named entity. Performing attribute extract is often used in disambiguating person names, extracting encylopedic knowledge, and in improving question answering systems.

Common Approaches

Some approaches to Attribute Extraction include:

  • Template/Pattern-Learning
    • Learn template contextual patterns using seed-based bootstrapping, and assign probability of attribute based on surrounding context. Variations of this method seems to be the predominately used approach in the literature.
  • Position Based
    • Basing predictions on absolute and relative ordering of where the attribute values typically appear in documents.
  • Transitivity-Based
    • Using transitivity of attributes across co-occuring entities. Co-occuring entities, such as people mentioned in a given person's biography page, tend to have similar attributes.
  • Latent-Based
    • Detect attributes that may not directly be mentioned in an article based on a topic-model.
  • Two-step: Extract then Verify
    • First system uses rules, NER, gazetteer based matching, and patterns (manually created or learned) to extract all attribute candidates
    • Then verify candidates using a classifier (with features typically based on the context, pattern values, dependency path) to trained determine if attribute value is correct for the given individual or should be discarded
    • Sometimes (depending on the application), researchers have opted to filter attribute candidates based on lexical patterns instead of performing classification.

Evaluation

One venue of evaluation for the attribute extraction task has been the Web People Search workshop (WePS: Searching information about entities in the web), which has had a attribute extraction challenge in its past two workshops: WePS-2 Attribute Extraction Subtask Guidelines, WePS-3 Attribute Extraction Subtask Guidelines

Relevant Papers