Bbd writeup of joint inference in information extraction
This is a review of the paper poon_2007_joint_inference_in_information_extraction by user:Bbd.
This paper targets the problems when segmentation and entity resolution are done one after the other. They also propose a single integrated inference process which performs segmentation of all records and resolution of all entities together, using Markov Logic Networks. They learn weights decriminatively using voted perceptron algorithm and inference is done using "slice sampling" Markov chain Monte Carlo algorithm. For getting reliable results and speed up the convergence of gradient descent algorithm they propose some interesting optimizations like having different learning rate for each weight.
They leverage the fact that if in one citation title is clearly delimited by punctuation, then this information can be used to extract title from a similar citation. They report better results on entity resolution in citeseer dataset compared to existing algorithms.