Difference between revisions of "Talukdar and Pereira ACL 2010"
PastStudents (talk | contribs) m |
PastStudents (talk | contribs) m |
||
Line 22: | Line 22: | ||
MAD (Talukdar and Crammer, ECML 2009) re-expressed the Adsorption method as a optimization problem. | MAD (Talukdar and Crammer, ECML 2009) re-expressed the Adsorption method as a optimization problem. | ||
− | == Evaluation == | + | === Evaluation === |
The comparison was done on subsets of [[UsesDataset::Freebase|Freease]] in 18 domains, TextRunner (Banko et al., IJCAI 2007) results. The evaluation metric used is the Mean Reciprocal Rank. | The comparison was done on subsets of [[UsesDataset::Freebase|Freease]] in 18 domains, TextRunner (Banko et al., IJCAI 2007) results. The evaluation metric used is the Mean Reciprocal Rank. | ||
− | === Results === | + | ==== Results ==== |
MAD performed significantly better than the other two on Freebase datasets using classes from (Pantel et al., EMNLP 2009) and WordNet as the gold standard. Three methods performs comparably on TextRunner data using the classes from WordNet. From the evaluation, MAD performs best when the average degree of nodes are high. | MAD performed significantly better than the other two on Freebase datasets using classes from (Pantel et al., EMNLP 2009) and WordNet as the gold standard. Three methods performs comparably on TextRunner data using the classes from WordNet. From the evaluation, MAD performs best when the average degree of nodes are high. | ||
Revision as of 14:09, 30 November 2010
Contents
Citation
Partha Pratim Talukdar and Fernando Pereira. 2010. Experiments in graph-based semi-supervised learning methods for class-instance acquisition. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10). Association for Computational Linguistics, Morristown, NJ, USA, 1473-1481.
Online version
Summary
This paper paper conducted an empirical comparison of three graph based semi-supervised learning methods for the Class-Instance Acquisition task.
Motivation
Traditional NER have focused on a small number of classes such as person and location. These classes are too broad to be useful for applications like word sense disambiguation and textual inference in practice. We have limited training data for supervised learning methods for the fine-grained classification. Therefore seed-based information extraction systems have been developed to extract new instances of a class from unstructured text using a few seed instances of that class.
Methods Compared
The general idea of graph based semi-supervised learning method works as follows: Given a connectivity graph which contains both labeled and unlabeled data, the labels of the labeled data are propagated to the unlabeled data through the graph with some constrains.
LP-ZGL (Zhu et al., ICML 2003) is the first graph based semi-supervised learning method. It propagates the labels of training data by ensuring the smoothness of the label assignment and preserving the labels of the training data. The smoothness (manifold assumption) of the label assignment implies the two highly connected nodes in the graph should have same or similar labels.
Adsorption (Baluja et al., WWW 2008) uses an iterative method. Its main idea is to limit the information that passes through each node and it relax the constrain of preserving the labels of training data.
MAD (Talukdar and Crammer, ECML 2009) re-expressed the Adsorption method as a optimization problem.
Evaluation
The comparison was done on subsets of Freease in 18 domains, TextRunner (Banko et al., IJCAI 2007) results. The evaluation metric used is the Mean Reciprocal Rank.
Results
MAD performed significantly better than the other two on Freebase datasets using classes from (Pantel et al., EMNLP 2009) and WordNet as the gold standard. Three methods performs comparably on TextRunner data using the classes from WordNet. From the evaluation, MAD performs best when the average degree of nodes are high.
The authors also investigated the impact of semantic constrains (additional edges between instances and attributes). These constrains are from YAGO KB (Suchanek et al, WWW 2008). The evaluation shows it is very beneficial to use these semantic constrains for these graph based semi-supervised learning methods.
Related papers
Three graph based SSL methods can be found in Zhu et al., ICML 2003, Baluja et al., WWW 2008, and Talukdar and Crammer, ECML 2009.