Attribute Extraction is a problem in the field of information extraction that focuses on identifying properties/features that describe a named entity. Performing attribute extract is often used in disambiguating person names, extracting encylopedic knowledge, and in improving question answering systems.
Some approaches to Attribute Extraction include:
- Learn template contextual patterns using seed-based bootstrapping, and assign probability of attribute based on surrounding context. Variations of this method seems to be the predominately used approach in the literature.
- Position Based
- Basing predictions on absolute and relative ordering of where the attribute values typically appear in documents.
- Using transitivity of attributes across co-occuring entities. Co-occuring entities, such as people mentioned in a given person's biography page, tend to have similar attributes.
- Detect attributes that may not directly be mentioned in an article based on a topic-model.
- Two-step: Extract then Verify
- First system uses rules, NER, gazetteer based matching, and patterns (manually created or learned) to extract all attribute candidates
- Then verify candidates using a classifier (with features typically based on the context, pattern values, dependency path) to trained determine if attribute value is correct for the given individual or should be discarded
- Sometimes (depending on the application), researchers have opted to filter attribute candidates based on lexical patterns instead of performing classification.
One venue of evaluation for the attribute extraction task has been the Web People Search workshop (WePS: Searching information about entities in the web), which has had a attribute extraction challenge in its past two workshops: WePS-2 Attribute Extraction Subtask Guidelines, WePS-3 Attribute Extraction Subtask Guidelines