Nschneid writeup of Sutton 2004

From Cohen Courses
Jump to navigationJump to search

This is Nschneid's review of Sutton_2004_collective_segmentation_and_labeling_of_distant_entities_in_information_extraction

Skip chain CRF model: certain pairs of hidden states which are non-adjacent contain an edge between them. These pairs are determined by the data (e.g. same words/words with same lemmas). Thus, there are three types of factors in the model: transition factors, emission factors, and skip factors.

Because our model contains many overlapping loops, exact inference is intractable. For some of our documents, a single probability table in the junction tree would require over 4 GB of memory. Instead, we perform approximate inference using a schedule for loopy belief propagation called tree reparameterization (TRP) (Wainwright et al., 2001).
We evaluate our model on a standard information extraction data set of e-mail seminar announcements. In the field type on which a linear-chain CRF had the most repetition errors, which is speaker name, we show that learning these dependencies with a skip-chain CRF leads to a 13.7% reduction in error. Failure analysis confirms that the reduction in error is due to increased recall on the repeated field mentions.

Evaluated on NER for email announcements of seminars, the skip links help mainly for the speaker field (3% improvement over linear-chain CRF) but hurts a bit for the other fields. Overall, Sutton & McCallum achieve 90.6 F1 vs. 90.2 for the linear-chain model. I wonder if this is a significant improvement.

  • What is TRP?
    • A fancy approximate BP that exploits the fact that BP is fast and exact for trees. The basic idea is to iteratively run BP over different trees that span the graph (as far as I recall). Wcohen