Difference between revisions of "Ritter et al, EMNLP 2011. Named Entity Recognition in Tweets: An Experimental Study"

Revision as of 18:07, 24 September 2011

Named Entity Recognition in Tweets: An Experimental Study, by A. Ritter, S. Clark, Mausam, O. Etzioni. In Empirical Methods in Natural Language Processing, 2011.

This Paper is available online [1].

Under Construction

Summary

This paper seeks to design an NLP pipeline from the ground up (POS tagging through Chunking, to Named Entity Recognition) for twitter tweets. Off the shelf NER systems are not able to perform NER on tweets effectively due to its noisy (misspellings, short forms, slangs), terse (140 char limit) nature. Tweets contains a large number of distinctive named entity types.

The authors experimentally evaluate the performance of off the shelf news trained NLP tools on Twitter data. POS tagging performance is reported to drop from 0.97 to 0.80.

In addition, the authors introduce a new approach to distant supervision (Mintz et al 2009) using topic model.

Brief description of the method

Part-of-Speech Tagging

The authors manually annotated 800 tweets using the PennTreeBank tagset. They added new tags for twitter phenomena such as retweets, @usernames, #hashtags, and urls.

To help with OOV words, they performed clustering to group together words which are distributionally similar. They performed hierarchical clustering using JCluster (Goodman, 2001) on 52 million tweets.

The POS tagging system, T-POS uses CRF to perform sequence labeling. Features they used include lexical (prefix, suffixes), clusters, in addition to standard features such as POS dictionary, spelling and contextual features.

Shallow parsing

The authors annotated the same 800 tweets above with tags from the CoNLL'00 shared task for shallow parsing (BIO labeling scheme). They used shallow parsing features described in Sha & Pereira (2003), in addition to clustering information which they had used for POS tagging.

Instead of using only 16k tokens of in-domain tweets, they trained on 210K tokens of CoNLL newswire data as well.

EG Algorithm

Given a set of distributions $\alpha \in \Delta ^{n}$ , the update equations are

$\alpha _{i,y}^{'}={\frac {1}{Z}}\alpha _{i,y}\exp(-\eta \nabla _{i,y})$

where

$Z_{i}=\sum _{\hat {y}}\alpha _{i,{\hat {y}}}\exp(-\eta \nabla _{i,{\hat {y}}})$

and

$\nabla {i,y}={\frac {\partial Q(\alpha )}{\partial \alpha _{i,y}}}=1+\log \alpha _{i,y}+{\frac {1}{C}}\mathbf {w} (\alpha )\cdot \left(\phi (x_{i},y_{i})-\phi (x_{i},y)\right)$

Batch learning

At each iteration, $\alpha '$ is updated simultaneously with all (or subset of) the available training instances.

Online learning

At each iteration, we choose a single training instance, and update $\alpha '$

Convergence rate of batch algorithm

To get within $\epsilon$ of the optimum parameters, we need $O({\frac {1}{\eta \epsilon }})$ iterations.

Experimental Result

The authors compared the performance of the EG algorithm to conjugated-gradient and L-BFGS methods.

Multiclass classification

The authors used a subset of the MNIST handwritten digits classification.

It can be seen that the EG algorithm converges considerably faster than the other methods.

Structured learning (dependency parsing)

The author used the Slovene data in UsesDataset:CoNLL-X Shared Task on Multilingual dependency parsing.

It can be seen that the EG algorithm converges faster in terms of objective function and accuracy measures.

Related Papers

The approach here is also similar to the use of EG algorithms for large margin structured classification in Bartlett et al NIPS 2004.

@@ Line 25: / Line 25: @@
 === Shallow parsing ===
-The authors annotated the same 800 tweets above with tags from the [[UsesDataset::CoNLL '00]] shared task for shallow parsing (BIO labeling scheme). They used shallow parsing features described in [[RelatedPaper::Sha_2003_Shallow_Parsing_with_Conditional_Random_Fields | Sha & Pereira (2003)]], in addition to clustering information which they had used for POS tagging.
+The authors annotated the same 800 tweets above with tags from the [[UsesDataset::CoNLL'00]] shared task for shallow parsing (BIO labeling scheme). They used shallow parsing features described in [[RelatedPaper::Sha_2003_Shallow_Parsing_with_Conditional_Random_Fields | Sha & Pereira (2003)]], in addition to clustering information which they had used for POS tagging.
 Instead of using only 16k tokens of in-domain tweets, they trained on 210K tokens of [[CoNLL%2700 | CoNLL]] newswire data as well.

Difference between revisions of "Ritter et al, EMNLP 2011. Named Entity Recognition in Tweets: An Experimental Study"

Revision as of 18:07, 24 September 2011

Contents