Siddharth writeup of two stage CRF

This paper proposes an alternative to skip chain CRF's. Instead of connecting nodes together and performing inference, they train a two stage CRF. The first CRF is a standard CRF and with some predefined features. The output of the first CRF is then used as input features to the second CRF. The describe a few methods to generate features, basically some knowledge from the output labeling of the first CRF such as the number of times a word has been assigned a particular label etc. The second CRF then learns the weight of these features and output the final prediction.

Criticism:
- The method seems a little less principled.
- What happens if the first CRF is not a good model ? If this happens, im not sure whether the second CRF would learn anything better than the first.
- I dont see the connection between the statistics of the training data and the sufficient statistics of the exponential model in this two stage CRF.

Navigation menu