Selen writeup of joint inference in information extraction

From Cohen Courses
Jump to navigationJump to search

This is a review of the paper poon_2007_joint_inference_in_information_extraction by user:Selen.

In this paper they apply markov logic network approach with MC-SAT algorithm to joint inference problem. The key idea is in information extraction segmenting one record can actually help another. For instance in citations collected from academic papers, same entity can be represented slightly differently but still refer to the same instance. They test their method on citation matching task using Citeseer and Cora datasets.

The main problem with the citation mathcing task is there are too many ambigous instances. For example, in this paper they say that if a paper has a similar title, with similar venue they it refers to the same citation. Now consider this:

Boyd, Kim : Advances in foo matching, NIPS 2006. Boyd, Kim : Recent Advances in foo matching, NIPS

Now lets say that the second paper is a year after but they didn't put the date (which happens) and the authors and titles are very similar. Is it the same entity, no. I don't really understand how they can go through this problem. Another issue that comes to my mind that not everytime punctuation can seperate fields, maybe he forgot to put . there there is only space etc. there should be better way to distinguish fields.