Liuliu project abstract

From Cohen Courses
Jump to navigationJump to search
  • What you plan to do with what data

I am planning to research on statistical relational learning for information extraction. Particularly, I am interested in Named Entity Recognition models that make use of long distance dependencies or relationships. For this project, I will follow the plan as below:

(1) Give a solid comparison between Bunescu and Mooney 2004 and Sutton and McCallum 2004. These are two related works in this field; however, no comparison has been done between them before.

(2) Extend Bunescu and Mooney’s model with more global relations and new entity features. For example, we could parse each sentence to a dependency tree. Then we use the dependency relations in the tree as global relations, and also use the word dependencies information in the tree as feature of entities[Bunescu 2007].

I am going to use the data used in Bunescu and Mooney 2004.

  • Why you think it’s interesting

Statistical relational learning helps model dependencies between related instances and uses the information about one object to help us reach conclusions about the other related objects. In a document, there are both local relations and global relations. Traditionally Named Entity Recognition methods only make use of local relations, i.e., the relation between adjacent words but not global relations. However, global relation is very informative of entity labels, and global relation better represent real world that sentences in a document are not a sequence of i.i.d. Two related works in this field (Bunescu and Mooney 2004, Sutton and McCallum 2004) shows the benefits of making use of global relation in name entity recognition.

  • Any relevant superpowers you might have

Graphical models, e.g., LDA

  • How you plan to evaluate your work

Compare the extended model with RMN and Skip-CRF

Use precision, recall and f-measure

  • What techniques you plan to use

RMN, Skip-CRF, Dependency parsing

  • What question you want to answer

What kinds of global relations should we add to model (e.g., dependency relations in a dependency parsing tree)? Shall we only capture relations between same kinds of entity, or also capture relations of different entities? How shall we define the clique template and learn potentials?

  • Who you might work with

Ni Lao

  • References

Bunescu and Mooney 2004: Relational Markov Networks for Collective Information Extraction

Sutton and McCallum 2004: Collective Segmentation and Labeling of Distant Entities in Information Extraction

Getoor and Taskar 2007: Introduction to Statistical Relational Learning

Bunescu 2007: Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction(PhD thesis)