Difference between revisions of "Bellare 2009 generalized expectation criteria for bootstrapping extractors using record text alignment"
Line 38: | Line 38: | ||
=== AlignCRF === | === AlignCRF === | ||
− | For | + | For a database record with token sequence and label sequence <math>(\mathbf{x}_{1}, \mathbf{y}_{1})=(<{x}_{1}[1], {x}_{1}[2], ..., {x}_{1}[m]>, <{y}_{1}[1], ..., {y}_{1}[m]>)</math>, a text sequence <math>\mathbf{x}_{2}=<{x}_{2}[1], {x}_{2}[2], ..., {x}_{2}[m]></math> and an alignment sequence <math>\mathbf{a}=<a_1, ..., a_n></math> where <math> \quad a_{i}=j </math> indicates <math>\quad(x_1[j], y_1[j]) </math> is assigned to the token <math>\quad x_{2}[i]</math>, the alignment model defines a probability distribution of the alignment sequence conditioned on the database record and the text sequence <math>(\mathbf{x}_{1}, \mathbf{y}_{1}, \mathbf{x}_{2})</math> as |
− | + | <math> | |
+ | p_{\Theta}(\mathbf{a}\vert\mathbf{x}_{1}, \mathbf{y}_{1}, \mathbf{x}_{2};\Theta) | ||
+ | ={exp(\sum_{t=1}^{n}{\Theta^{\mathrm{T}}\vec{f}(a',\mathbf{x}_1,\mathbf{y}_1,\mathbf{x}_2,t)})\over Z_{\Theta}(\mathbf{x}_1,\mathbf{y}_1,\mathbf{x}_2)} | ||
+ | </math> | ||
+ | |||
+ | <math> | ||
+ | p_{\Theta}(a_{t}\vert\mathbf{x}_{1}, \mathbf{y}_{1}, \mathbf{x}_{2};\Theta) | ||
+ | ={exp(\Theta^{\mathrm{T}}\vec{f}(a_t,\mathbf{x}_1,\mathbf{y}_1,\mathbf{x}_2,t))\over | ||
+ | exp(\sum_{a'}{\Theta^{\mathrm{T}}\vec{f}(a_t,\mathbf{x}_1,\mathbf{y}_1,\mathbf{x}_2,t)})} | ||
+ | </math> | ||
+ | |||
+ | <math> | ||
+ | {\partial p_{\Theta}(a_t\vert\mathbf{x}_{1}, \mathbf{y}_{1}, \mathbf{x}_{2};\Theta) \over \partial \Theta} | ||
+ | =p_{\Theta}(a_{t}\vert\mathbf{x}_{1}, \mathbf{y}_{1}, \mathbf{x}_{2};\Theta) | ||
+ | [ | ||
+ | \vec{f}(a_t,\mathbf{x}_1,\mathbf{y}_1,\mathbf{x}_2,t) | ||
+ | -E_{p_{\Theta}(a)}(\vec{f}(a,\mathbf{x}_1,\mathbf{y}_1,\mathbf{x}_2,t)) | ||
+ | ] | ||
+ | </math> | ||
+ | |||
+ | The expectation criteria is determined in the following manner: for expectation features, for each label, the top N extraction features are selected by mutual information with that label. Also, the top N alignment features that have highest mutual information with correct labeling are selected as alignment criteria. The target expectations of these criteria are binned into 11 bins [0.05, 0.1, 0.2, ..., 0.9, 0.95]. [[http://www.cs.umass.edu/~kedarb/dbie_expts.txt]] is a complete list of expectation criteria. | ||
Given expectation criteria <math>\mathcal{C} = <\mathbf{F}, \mathbf{P}, \mathbf{W}></math> where <math>\mathbf{F}=<f_1,...,f_l></math> is a list of binary feature functions, <math>\mathbf{P}=<p_1,...,p_l></math> is target expectations, <math>\mathbf{W}=<w_1,...,w_l></math> is weights, the objective function to optimize is the following: | Given expectation criteria <math>\mathcal{C} = <\mathbf{F}, \mathbf{P}, \mathbf{W}></math> where <math>\mathbf{F}=<f_1,...,f_l></math> is a list of binary feature functions, <math>\mathbf{P}=<p_1,...,p_l></math> is target expectations, <math>\mathbf{W}=<w_1,...,w_l></math> is weights, the objective function to optimize is the following: | ||
Line 52: | Line 72: | ||
where <math> | where <math> | ||
− | \Delta( | + | \Delta(f, p, w, \Theta) = w( {E_{p_{\Theta}}(\mathbf{A}_{f}) \over |\mathbf{A}_{f}|} - p)^{2} |
− | </math> is | + | </math> is the squared divergence and |
− | |||
<math> | <math> | ||
− | + | \mathbf{A}_{f} is | |
− | + | </math>. The authors use the L-BFGS algorithm to maximize the objective function. | |
− | </math> | ||
− | |||
− | |||
=== ExtrCRF === | === ExtrCRF === |
Revision as of 09:41, 1 November 2011
A summary is coming soon from Daegunw!
Contents
Citation
Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment, by K. Bellare, A. McCallum. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009.
Online version
This Paper is available online [1].
Summary
This paper presents a novel approach using Generalized Expectation Criteria to train a Conditional Random Field model for an IE task. In a setting where there exists a database
Brief description of the method
The paper present two CRF models: AlignCRF and ExtrCRF. The first is a zero-order CRF model used to predict labels for a text sequence given a matching DB record. The other model is a first-order linear-chain CRF to extract when there is no DB record to match.
Features
- Extraction features (in AlignCRF and ExtrCRF)
- regular expressions detecting tokens containing all characters, all digits, or all alphanumeric
- number of characters and digits in the token (ex. [NUMCHAR=3, NUMDIGITS=1])
- domain-specific patterns for 'date', and 'pages'
- token identity, prefix/suffix, character n-grams
- presence of a token in lexicons such as "last names", "publisher names", "cities
- lexicon features within a window of 10
- regular expression feature within a window of 10
- token identity features within a window of 3
- Alignment features (in AlignCRF)
- exact token match
- approximate token match after binning Jaro-Winkler edit distance between tokens
- substring token match
- prefix/suffix token match (if the prefixes/suffixes match for lengths 1,2,3,4)
- exact and approximate token matches at offsets (-1,-1) and (+1,+1) around the alignment
AlignCRF
For a database record with token sequence and label sequence , a text sequence and an alignment sequence where indicates is assigned to the token , the alignment model defines a probability distribution of the alignment sequence conditioned on the database record and the text sequence as
The expectation criteria is determined in the following manner: for expectation features, for each label, the top N extraction features are selected by mutual information with that label. Also, the top N alignment features that have highest mutual information with correct labeling are selected as alignment criteria. The target expectations of these criteria are binned into 11 bins [0.05, 0.1, 0.2, ..., 0.9, 0.95]. [[2]] is a complete list of expectation criteria.
Given expectation criteria where is a list of binary feature functions, is target expectations, is weights, the objective function to optimize is the following:
where is the squared divergence and . The authors use the L-BFGS algorithm to maximize the objective function.
ExtrCRF
Experimental Result
The authors tested this method on MUC-7 and the oncology part of PennBioIE corpus. The base learner used for the experiment is a linear-chain Conditional Random Fields. Features used are orthographical features (regexp patterns), lexical and morphological features (prefix, suffix, lemmatized tokens), and contextual features (features of neighbor tokens). In terms of the number of tokens that had to be labled to reach the maximal F-score, SeSAL could save about 60% over FuSAL, and 80% over random sampling. Having high confidence was also important because it could save the model from making errors in the early stages.
Related papers
Comment
If you're further interested in active learning for NLP, you might want to see Burr Settles' review of active learning: http://active-learning.net/ --Brendan 22:51, 13 October 2011 (UTC)