Liuy writeup of Sutton 2004
This is a review of Sutton_2004_collective_segmentation_and_labeling_of_distant_entities_in_information_extraction by user:Liuy
This paper uses a conditional model to connect identical words. It tries to represent nonlocal dependencies between labels of similar words. They assume same words, if appears multiple times, tends to have the same label. They segment a text document into mentions and classifies them by their entity type, considering nonlocal dependencies between remote mentions.
The difficulty of having too many parameters when representing nonlocal dependency in a generative model is resolved by taking a look at input string and having a selection of skip edge accordingly.
I like the work because it is a meaningful attempt to come up with joint probabilistic models for extraction. The paper tries to leverage long distance dependencies for better accuracy, by bridging pairs of identical words that are assumed to same labeled. The paper wants to use not only the local information, but information all through the text. Also, it shows empirical advantage in terms of error reduction on standard IE datasets. I am a bit concerned about the inference efficiency. The paper used approximate inference by loopy belief propagation. Basicall, it has the similar inference methods as Relational Markov Network. It is not efficient and also not even guaranteed to converge. The MAP parameter estimation is quite standard.