Difference between revisions of "Taskar et al. 2004. Max-margin Parsing"

From Cohen Courses
Jump to navigationJump to search
m
m
Line 27: Line 27:
 
where <math>I_{i,y}</math> indicates whether <math>y</math> is the true parse for sentence <math>i</math>
 
where <math>I_{i,y}</math> indicates whether <math>y</math> is the true parse for sentence <math>i</math>
  
For each sentence, we need to enumerate all possible parse trees, which is exponential in size. However, we can make use of local substructures similar to chart parsing dynamic programming algorithms to factor these trees into parts like <math>\langle A,s,e,i\rangle</math> and <math>\langle A\rightarrow B C,s,m,e,i\rangle</math>, where <math>s,m,e,i</math> refers to start, split, end points and sentence number respectively.
+
For each sentence, we need to enumerate all possible parse trees, which is exponential in size. However, we can make use of local substructures similar to [[UsesMethod::CYK Parsing | chart parsing dynamic programming algorithm]] to factor these trees into parts like <math>\langle A,s,e,i\rangle</math> and <math>\langle A\rightarrow B C,s,m,e,i\rangle</math>, where <math>s,m,e,i</math> refers to start, split, end points and sentence number respectively.
  
 
Therefore,  
 
Therefore,  

Revision as of 19:03, 30 October 2011

Max-margin parsing, by Ben Taskar, Taskar, B. and Klein, D. and Collins, M. and Koller, D. and Manning, C.. In Proc. EMNLP, 2004.

This Paper is available online [1].

Summary

This paper presents a novel approach to Parsing by maximizing separating margins using Support Vector Machines. They show how we can reformulate the parsing problem as a discriminative task, which allow an arbitrary number of features to be used. Also, such a formulation allows them to incorporate a loss function that directly penalizes incorrect parse trees appropriately.

Brief description of the method

Instead of a probabilistic interpretation for parse trees, we seek to find:

for all sentences in the training data, being the parse tree, the set of possible parses for .

Formulating it as an optimization problem,

Using SVM, we can find the dual of the above program

s.t

where indicates whether is the true parse for sentence

For each sentence, we need to enumerate all possible parse trees, which is exponential in size. However, we can make use of local substructures similar to chart parsing dynamic programming algorithm to factor these trees into parts like and , where refers to start, split, end points and sentence number respectively.

Therefore,

where is the set of all possible parts. can be any function that maps a rule production part to some feature vector representation. In addition, the loss function can also be decomposed into sum of parts similar to above. In the paper, the loss function used was the number of constituent errors made in a parse.

By incorporating parts, the factored dual objective can be expressed in polynomial number of variables, which is in fact cubic in the length of the sentence.

Results

Experiments on the Penn Treebank dataset with lexical features achieved 0.43 f-score over the Collins 99 parser.

Related Papers

McDonald_et_al,_ACL_2005:_Non-projective_dependency_parsing_using_spanning_tree_algorithms Margin learning for dependency parsing

Tsochantaridis,_Joachims_,_Support_vector_machine_learning_for_interdependent_and_structured_output_spaces_2004 Using SVMs for structured output space.