Mnduong writeup of Poon & Domingos

This paper introduces a method to combine the two tasks of segmentation and entity resolution into one process, where each task's results can be used to help the other. The motivation example is in citation matching, where an easily segmented citation (with full punctuations) can be used to help segment another citation which does not have punctuation but is similar to the first one.
The method uses a Markov Logic Network, where knowledge engineering takes the form of writing rules for the network. It uses existing learning and inference algorithms designed for MLNs.
The method was shown to outperform other published methods in entity resolution recall. It also improved segmentation F1 scores of an isolated segmentation baseline significantly at the 1% level.
The paper provides clear explanations and motivations for the system's rules. In particular, I like the paragraph explaining how a naive combination of the segmentation MLN and the entity resolution MLN would hurt accuracy.
I found the conditions for the SimilarTitle rule quite generous. Two title strings are considered similar if they start with the same trigram and end with the same token. This will classify many different pairs as similar, for example strings that follow the format: "Using Markov Logic Networks for ... extraction". Furthermore, I don't understand how the indices in this rule are instantiated.

Navigation menu