Difference between revisions of "Lin and Wu. 2009. Phrase Clustering for Discriminative Learning."

From Cohen Courses
Jump to navigationJump to search
m
m
Line 7: Line 7:
 
== Summary ==
 
== Summary ==
  
This paper makes use of phrase clustering to improve on the state of the art for the [[AddressesProblem::Named Entity Recognition]] problem. They obtained 1 F-score improvement over NER systems on the CoNLL benchmark (in 2009). In their paper, phrases are basically queries that occur more than 100 times in a 700 billion token web corpus ([[RelatedPaper:Lin et al., 2008]]).  
+
This paper makes use of phrase clustering to improve on the state of the art for the [[AddressesProblem::Named Entity Recognition]] problem. They obtained 1 F-score improvement over NER systems on the CoNLL benchmark (in 2009). In their paper, phrases are basically queries that occur more than 100 times in a 700 billion token web corpus ([[RelatedPaper::Lin et al., 2008]]).  
  
 
== Brief description of the method ==
 
== Brief description of the method ==
Line 15: Line 15:
 
=== Phrases as feature vectors ===
 
=== Phrases as feature vectors ===
  
Each phrase is represented as a vector of its context. The frequency count of words appearing within a fixed sized window is aggregated and converted into  [[UsesMethod::pointwise mutual information]](PMI) values
+
Each phrase is represented as a vector of its context. The frequency count of words appearing within a fixed sized window is aggregated and converted into  [[UsesMethod::pointwise mutual information]](PMI) values.
  
 +
=== Parallel K-Means using MapReduce ===
  
== Experimental Result ==
+
The phrase vectors are then clustered using K-Means clustering algorithm, which can be easily parallelized.
  
The authors performed experiments on the [[UsesDataset::GENIA_dataset | GENIA Corpus]], [[UsesDataset::JNLPBA]] corpus and [[UsesDataset::Ancora]].
+
Soft clustering can be done by assigning phrases to cluster centroids that are within a threshold distance. Soft clustering may be better able to model the fact that phrases may contain several "senses".
  
[[Image:GENIA_results.png]]
+
== Experimental Result ==
 
 
Their system achieve significant performance gains over similar flat model [[RelatedPaper::Sarawagi_and_Cohen_NIPS_2004|semi-CRF]] NER system.
 
  
[[Image:JNLPBA_results.png]]
+
The effectiveness of phrase clustering is evaluated on [[NER]] problem. For NER, they used 1-word context window and hard clustering, and a linear chain [[UsesMethod::CRF]] with standard NER features. The baseline features contains a total of 48 feature templates.
  
[[Image:Ancora_results.png]]
+
The results on [[UsesDataset::CoNLL]] test set are as follows:
  
The author's system are generally perform better than flat models when evaluated on all the entities as compared to just on top-level entities. It demonstrates the relevance of modeling named entities hierarchy in an [[NER]] system.
+
[[Image:conll_results.png]]
  
 
== Related Papers ==
 
== Related Papers ==

Revision as of 22:09, 24 September 2011

Under construction

Phrase clustering for discriminative learning, by D. Lin, X. Wu. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2009.

This Paper is available online [1].

Summary

This paper makes use of phrase clustering to improve on the state of the art for the Named Entity Recognition problem. They obtained 1 F-score improvement over NER systems on the CoNLL benchmark (in 2009). In their paper, phrases are basically queries that occur more than 100 times in a 700 billion token web corpus (Lin et al., 2008).

Brief description of the method

Due to the large number of possible phrases, the authors used Bloom filters to decide whether a sequence of tokens is considered a phrase.

Phrases as feature vectors

Each phrase is represented as a vector of its context. The frequency count of words appearing within a fixed sized window is aggregated and converted into pointwise mutual information(PMI) values.

Parallel K-Means using MapReduce

The phrase vectors are then clustered using K-Means clustering algorithm, which can be easily parallelized.

Soft clustering can be done by assigning phrases to cluster centroids that are within a threshold distance. Soft clustering may be better able to model the fact that phrases may contain several "senses".

Experimental Result

The effectiveness of phrase clustering is evaluated on NER problem. For NER, they used 1-word context window and hard clustering, and a linear chain CRF with standard NER features. The baseline features contains a total of 48 feature templates.

The results on CoNLL test set are as follows:

Conll results.png

Related Papers