Difference between revisions of "Brill, CL 1995"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Duame, H., Langford, J., and Marcu, D. 2009. Search-based structured prediction. Machine Learning. 75. 3. p297-325 == Online version == [http://www.umiacs.umd.e…')
 
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Citation ==
 
== Citation ==
  
Duame, H., Langford, J., and Marcu, D. 2009. Search-based structured prediction. Machine Learning. 75. 3. p297-325
+
Brill, E. 1995. Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics. 21. 4. p543-565
  
 
== Online version ==
 
== Online version ==
  
[http://www.umiacs.umd.edu/~hal/docs/daume09searn.pdf Search-based structured prediction]
+
[http://www.ldc.upenn.edu/acl/J/J95/J95-4004.pdf Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging]
  
 
== Summary ==
 
== Summary ==
  
This journal [[Category::paper]] introduces [[UsesMethod::SEARN]], a meta-algorithm that combines searching and learning to make structured predictions. Note that this is the journal version of the 2006 paper that introduced this method.
+
This [[Category::paper]] introduces a learning technique called "Transformation-based error-driven learning", a.k.a. [[UsesMethod::Transformation Based Learning]] (TBL).  
  
The algorithm is summarized in the following figure, from page 6 of the paper:
+
Transformation Based Learning is an algorithm that learns a sequence of transformations to improve tagging on some baseline tagger. Transformations are broken down into two components: a ''triggering event'' (such as if the previous word is a determiner) and a ''re-write rule'' (such as change tag from modal to noun). Some advantages of Transformation based learning include the following: simple conceptually, TBL can be adapted to different learning problems, rich triggers/rules can make use of specific information and context, seemingly resistant to over-fitting(observed empirically, not entirely understood). When using Transformation based learning, a number of things should be considered: when constructing all possible transformations should we manually create rules or make templates?; the potentially huge search space can be problematic so you may need to use linguistic intuition to limit space; there are no probabilities/confidence associated with results; transformations in one environment could affect application in another: should transformations be applied immediately or only after entire corpus is examined for triggering conditions? what order do we process corpus? (left-to-right or right-to-left).
  
[[File:searn-algorithm.png]]
+
The authors described the use of Transformation Based Learning on [[AddressesProblem::POS Tagging]]. When compared to a markov-model based POS tagger, the TBL Tagger was able to achieve comparable tagging accuracy with a number of rules which is much smaller than the number of context probabilities calculated for stocastic tagger, and can do so with a much smaller sized training corpus. Also initial rules contribute most to tagging accuracy (say first 100 or 200), and rest improve performance marginally.
  
The key points from the paper are:
+
== Transformation-Based Learning ==
* [[UsesMethod::SEARN]] is an algorithm that solves complex structured predictions with minimal assumptions on the structure of the output and loss function
+
 
* Their experiments show that [[UsesMethod::SEARN]] is competitive with existing standard structured prediction algorithms on sequence labeling tasks.
+
'''The learning algorithm is summarized as follows''' (see Figure 1 from paper as well):
* Authors described the use of [[UsesMethod::SEARN]] on [[AddressesProblem::text summarization]], which yielded state-of-the-art performance.
+
* Pass un-annotated corpus (training data) through initial-state annotator
 +
* Compare against truth to get current score (based on number of classification errors)
 +
* Loop until no transformation can be found to improve score (Greedy search)
 +
** Consider all transformations rules applied to training data, select best
 +
** Apply to transformation to data & get current score
 +
** Add transformation to ordered transformation list
 +
 
 +
[[File:brill95_fig1.png]]
 +
 
 +
 
 +
'''Transformations are applied as follows''':
 +
* Run initial-state annotator on unseen data
 +
* Loop through ordered list of transformations, and apply each transformation.
 +
 
 +
 
 +
== Example usage: POS Tagging ==
 +
* Initial-state:
 +
** Each word assigned to most likely POS tag based on training corpus
 +
* Rules:
 +
** Non-lexical: Based mainly on tags of words located near the target position
 +
** Lexical: Based mainly on words in surrounding context
  
 
== Related papers ==
 
== Related papers ==
 
+
* '''Text Chunking using Transformation-Based Learning''': Transformation-based learning applied to Noun-Phrase Chunking - [[RelatedPaper::Ramshaw_&_Marcus,_1995]].
* '''SEARN in Practice''': This unpublished manuscript showcases three example problems where SEARN can be used - [[RelatedPaper::Daume_et_al,_2006]].
+
* '''Tagging gene and protein names in biomedical text''': Transformation-based learning applied to gene & protein tagging, for system called ''Abgene'' - [[RelatedPaper::Tanabe_&_Wilbur,_2002]].
 +
* '''Sense Deduction: The Power of Peewees Applied to SENSEVAL-2 Sweedish Lexical Sample Task''': Transformation-based learning applied to word sense disambiguation - [[RelatedPaper::Lager_&_Zinovjeva,_2001]].

Latest revision as of 01:28, 23 November 2010

Citation

Brill, E. 1995. Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics. 21. 4. p543-565

Online version

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Summary

This paper introduces a learning technique called "Transformation-based error-driven learning", a.k.a. Transformation Based Learning (TBL).

Transformation Based Learning is an algorithm that learns a sequence of transformations to improve tagging on some baseline tagger. Transformations are broken down into two components: a triggering event (such as if the previous word is a determiner) and a re-write rule (such as change tag from modal to noun). Some advantages of Transformation based learning include the following: simple conceptually, TBL can be adapted to different learning problems, rich triggers/rules can make use of specific information and context, seemingly resistant to over-fitting(observed empirically, not entirely understood). When using Transformation based learning, a number of things should be considered: when constructing all possible transformations should we manually create rules or make templates?; the potentially huge search space can be problematic so you may need to use linguistic intuition to limit space; there are no probabilities/confidence associated with results; transformations in one environment could affect application in another: should transformations be applied immediately or only after entire corpus is examined for triggering conditions? what order do we process corpus? (left-to-right or right-to-left).

The authors described the use of Transformation Based Learning on POS Tagging. When compared to a markov-model based POS tagger, the TBL Tagger was able to achieve comparable tagging accuracy with a number of rules which is much smaller than the number of context probabilities calculated for stocastic tagger, and can do so with a much smaller sized training corpus. Also initial rules contribute most to tagging accuracy (say first 100 or 200), and rest improve performance marginally.

Transformation-Based Learning

The learning algorithm is summarized as follows (see Figure 1 from paper as well):

  • Pass un-annotated corpus (training data) through initial-state annotator
  • Compare against truth to get current score (based on number of classification errors)
  • Loop until no transformation can be found to improve score (Greedy search)
    • Consider all transformations rules applied to training data, select best
    • Apply to transformation to data & get current score
    • Add transformation to ordered transformation list

Brill95 fig1.png


Transformations are applied as follows:

  • Run initial-state annotator on unseen data
  • Loop through ordered list of transformations, and apply each transformation.


Example usage: POS Tagging

  • Initial-state:
    • Each word assigned to most likely POS tag based on training corpus
  • Rules:
    • Non-lexical: Based mainly on tags of words located near the target position
    • Lexical: Based mainly on words in surrounding context

Related papers

  • Text Chunking using Transformation-Based Learning: Transformation-based learning applied to Noun-Phrase Chunking - Ramshaw_&_Marcus,_1995.
  • Tagging gene and protein names in biomedical text: Transformation-based learning applied to gene & protein tagging, for system called Abgene - Tanabe_&_Wilbur,_2002.
  • Sense Deduction: The Power of Peewees Applied to SENSEVAL-2 Sweedish Lexical Sample Task: Transformation-based learning applied to word sense disambiguation - Lager_&_Zinovjeva,_2001.