Taskar et al. 2004. Max-margin Parsing
Max-margin parsing, by Ben Taskar, Taskar, B. and Klein, D. and Collins, M. and Koller, D. and Manning, C.. In Proc. EMNLP, 2004.
This paper presents a novel approach to Parsing by maximizing separating margins using Support Vector Machines. They show how we can reformulate the parsing problem as a discriminative task, which allow an arbitrary number of features to be used. Also, such a formulation allows them to incorporate a loss function that directly penalizes incorrect parse trees appropriately.
Brief description of the method
Instead of a probabilistic interpretation for parse trees, we seek to find:
for all sentences in the training data, being the parse tree, the set of possible parses for .
Formulating it as an optimization problem,
Using SVM, we can find the dual of the above program
where indicates whether is the true parse for sentence
For each sentence, we need to enumerate all possible parse trees, which is exponential in size. However, we can make use of local substructures similar to chart parsing dynamic programming algorithm to factor these trees into parts like and , where refers to start, split, end points and sentence number respectively.
where is the set of all possible parts. can be any function that maps a rule production part to some feature vector representation. In addition, the loss function can also be decomposed into sum of parts similar to above. In the paper, the loss function used was the number of constituent errors made in a parse.
By incorporating parts, the factored dual objective can be expressed in polynomial number of variables, which is in fact cubic in the length of the sentence.
Experiments on the Penn Treebank dataset with lexical features achieved 0.43 f-score over the Collins 99 parser.
McDonald_et_al,_ACL_2005:_Non-projective_dependency_parsing_using_spanning_tree_algorithms Margin learning for dependency parsing
Tsochantaridis,_Joachims_,_Support_vector_machine_learning_for_interdependent_and_structured_output_spaces_2004 Using SVMs for structured output space.