Difference between revisions of "Attribute Extraction"
From Cohen Courses
Jump to navigationJump to searchPastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Summary == | == Summary == | ||
− | Attribute Extraction is a [[category::problem]] in the field of information extraction that focuses on identifying properties/features that describe a named entity. Performing attribute extract is often used in disambiguating person names, extracting encylopedic knowledge, and in improving question answering. | + | Attribute Extraction is a [[category::problem]] in the field of information extraction that focuses on identifying properties/features that describe a named entity. Performing attribute extract is often used in disambiguating person names, extracting encylopedic knowledge, and in improving question answering systems. |
== Common Approaches == | == Common Approaches == | ||
Line 7: | Line 7: | ||
Some approaches to Attribute Extraction include: | Some approaches to Attribute Extraction include: | ||
* '''Template/Pattern-Learning''' | * '''Template/Pattern-Learning''' | ||
− | ** Learn template contextual patterns using seed-based bootstrapping, and assign probability of attribute based on surrounding context. Variations of this method seems to be the predominately used approach. | + | ** Learn template contextual patterns using seed-based bootstrapping, and assign probability of attribute based on surrounding context. Variations of this method seems to be the predominately used approach in the literature. |
* '''Position Based''' | * '''Position Based''' | ||
** Basing predictions on absolute and relative ordering of where the attribute values typically appear in documents. | ** Basing predictions on absolute and relative ordering of where the attribute values typically appear in documents. | ||
Line 14: | Line 14: | ||
* '''Latent-Based''' | * '''Latent-Based''' | ||
** Detect attributes that may not directly be mentioned in an article based on a topic-model. | ** Detect attributes that may not directly be mentioned in an article based on a topic-model. | ||
− | * '''Extract then Verify''' | + | * '''Two-step: Extract then Verify''' |
− | ** | + | ** First system uses rules, NER, gazetteer based matching, and patterns (manually created or learned) to extract all attribute candidates |
− | ** Then verify candidates using a classifier (with features based on the context, pattern values, | + | ** Then verify candidates using a classifier (with features typically based on the context, pattern values, dependency path) to trained determine if attribute value is correct for the given individual or should be discarded |
** Sometimes (depending on the application), researchers have opted to filter attribute candidates based on lexical patterns instead of performing classification. | ** Sometimes (depending on the application), researchers have opted to filter attribute candidates based on lexical patterns instead of performing classification. | ||
Latest revision as of 19:29, 30 November 2010
Summary
Attribute Extraction is a problem in the field of information extraction that focuses on identifying properties/features that describe a named entity. Performing attribute extract is often used in disambiguating person names, extracting encylopedic knowledge, and in improving question answering systems.
Common Approaches
Some approaches to Attribute Extraction include:
- Template/Pattern-Learning
- Learn template contextual patterns using seed-based bootstrapping, and assign probability of attribute based on surrounding context. Variations of this method seems to be the predominately used approach in the literature.
- Position Based
- Basing predictions on absolute and relative ordering of where the attribute values typically appear in documents.
- Transitivity-Based
- Using transitivity of attributes across co-occuring entities. Co-occuring entities, such as people mentioned in a given person's biography page, tend to have similar attributes.
- Latent-Based
- Detect attributes that may not directly be mentioned in an article based on a topic-model.
- Two-step: Extract then Verify
- First system uses rules, NER, gazetteer based matching, and patterns (manually created or learned) to extract all attribute candidates
- Then verify candidates using a classifier (with features typically based on the context, pattern values, dependency path) to trained determine if attribute value is correct for the given individual or should be discarded
- Sometimes (depending on the application), researchers have opted to filter attribute candidates based on lexical patterns instead of performing classification.
Evaluation
One venue of evaluation for the attribute extraction task has been the Web People Search workshop (WePS: Searching information about entities in the web), which has had a attribute extraction challenge in its past two workshops: WePS-2 Attribute Extraction Subtask Guidelines, WePS-3 Attribute Extraction Subtask Guidelines