Difference between revisions of "Huang et al, ACL 2009: Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering"
PastStudents (talk | contribs) (Created page with '== Citation == Jian Huang, Sarah M. Taylor, Jonathan L. Smith, Konstantinos A. Fotiadis and C. Lee Giles. 2010. Profile Based Cross-Document Coreference Using Kernelized Fuzzy R…') |
PastStudents (talk | contribs) |
(No difference)
|
Revision as of 22:59, 30 November 2010
Contents
Citation
Jian Huang, Sarah M. Taylor, Jonathan L. Smith, Konstantinos A. Fotiadis and C. Lee Giles. 2010. Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 414–422.
Online version
An online version of this paper is available [1].
Summary
This paper introduces Conditional Random Fields as sequential classification model.
Drawbacks with HMMs and MeMMs
The paper points out the some of main drawbacks in generative models such as HMMs and in discriminative ones such as Maximum Entropy Markov Models:
- HMMs don't allow the addition of overlapping arbitrary features of the observations.
- MeMMs are normalized locally over each observation, and hence suffer from the Label Bias problem, where the transitions going out from a state compete only against each other, as opposed to all the other transitions in the model.
Mathematical Definition of CRFs
The paper then introduces Conditional Random Fields as a model that addresses both of these problems, and defines them formally:
The probability P(Y/X) of a state sequence Y given an observation sequence X is:
where y|S is the set of components of y associated with the vertices in subgraph S, and features and are given and fixed CRFs can be conceptualized as:
Parameter Estimation for CRFs
The paper provides two methods to perform parameter estimation (training) for CRFs, both based on improved iterative scaling. One iteration of either of these methods has roughly the same time and space complexity as the Baum-Welch training algorithm for HMMs. Both are guaranteed to converge.
Experiments
The authors performs three different types of experiments:
- They generate synthetic data from an HMM, and use this data to train a CRF and an MeMM with the same structures. They find that the CRF error is 4.6% and the MeMM error is 42%, thus proving that the CRF solves the label bias problem encountered by the MeMM.
- They generate synthetic data using a mix of first-order and second-order HMMs, and train first-order HMM, MeMM and CRF models on this data, without using any overlapping features. They find that CRFs perform the best, thus showing their robustness to incorrect modeling assumptions. The addition of overlapping features substantially improves the performance of CRFs and MeMMs
- Finally, they perform POS tagging on a subset of the Penn Treebank, using an HMM, MeMM and a CRF. They repeat this both without and with orthographic features. Without orthographic features, the HMM outperforms the MeMM and the CRF outperforms the HMM, while with them, the MeMM and CRF both significantly outperform HMMs, and the CRF still remains the best.
Conclusion
The authors conclude that CRFs offer the following significant advantages: discriminative training, combination of arbitrary, overlapping features from both the past and future; efficient training and decoding based on dynamic programming; and parameter estimation guaranteed to find the global optimum. Their main disadvantage is the slow convergence of the training algorithm compared to MeMMs and HMMs.