Liuy writeup of Poon and Domingos
This is a review of the paper poon_2007_joint_inference_in_information_extraction by user:Liuy.
The paper attempts to alleviate the complexity and cost of joint inference by a citation matching domain, using Markov logic. Given the fact that segmenting one record can sometime contributes to segment similar ones. they do the segmentation of all records together. Using Markov logic, they further reduce the problem into writing the proper logical formulas.
The main evidence predicate in MLN is Token(t,i,c). It is true when t appears in the ith position of the cth citation. They come up with an "isolated segmentation", by using HMM. I am particularly interested in the technique they develop to pinpoint field boundaries that is marked by punctuation symbols. For "Entity Resolution", they define predicates and rules for between-stage information passing. I think it is a cool idea to explore Joint Segmentation, based on the intuition that the segmentation of a citation can help the segmentation of similar ones. The rules they define to help recognize the chance for one title segmentation to help another, are discussed in the sparse case and dense case.
I also like their modification to MC-SAT to speed up learning.
I however have several questions: First, they compare the performance of the proposed algorithm with those non-joint inference ones and semi-joint ones. But I think a fair comparison is with other possible joint inference schemes that not defined on top of Markov logic. Second, the principle underlying the proposed method is not clear to me, although they empirically proved it works on the CiteSeer and Cora citation dataset.