Difference between revisions of "Structured Output Learning with Indirect Supervision"

From Cohen Courses
Jump to navigationJump to search
Line 14: Line 14:
 
The authors present their approach, which they call "Joint Learning with Indirect Supervision", or J-LIS, as a generalization of the structured output [[Support_Vector_Machines || SVMs]].  The basic idea is to learn from a small number of fully or directly supervised training examples, and augment training with indirectly supervised (binary labels) training data.   
 
The authors present their approach, which they call "Joint Learning with Indirect Supervision", or J-LIS, as a generalization of the structured output [[Support_Vector_Machines || SVMs]].  The basic idea is to learn from a small number of fully or directly supervised training examples, and augment training with indirectly supervised (binary labels) training data.   
  
The goal of learning in standard structured output prediction is to find a weight vector '''w''' such that <math>\mathbf{h}_i = \arg\max_{\mathbf{h} \in \mathcal{H}(\mathbf{x})} \mathbf{w}^T \Phi (\mathbf{x}_i, \mathbf{h})</math>.   
+
The goal of learning in standard structured output prediction is to find a weight vector '''w''' such that <math>\mathbf{h}_i = \arg\max_{\mathbf{h} \in \mathcal{H}(\mathbf{x})} \mathbf{w}^T \Phi (\mathbf{x}_i, \mathbf{h})</math>.  In this case, '''x''' is the input, <math>\mathcal{H}</math>('''x''') is the set of all feasible structures with '''x''' as input, and <math>\Phi</math> is a feature generation function.  The key assumption that is used so that we can incorporate indirect supervision is that an input '''x''' generates a valid output (<math> y = 1 </math>) if and only if its best structure is well-formed, and conversely an input '''x''' generates an invalid output (<math> y = -1 </math>) if ''every'' structure for that input is bad.  In mathematical terms:
 +
 
 +
* <math> \forall \quad (\mathbf{x}, -1) \in B^-, \quad \forall \mathbf{h} \in \mathcal{H}(\mathbf{x}), \mathbf{w}^T \Phi (\mathbf{x}, \mathbf{h}) \leq 0 </math>
 +
* <math> \forall \quad (\mathbf{x}, +1) \in B^+, \quad \exists \mathbf{h} \in \mathcal{H}(\mathbf{x}), \mathbf{w}^T \Phi (\mathbf{x}, \mathbf{h}) \geq 0 </math>
 +
where <math>B^+, B^-</math> refer to the positive and negative partitions of the indirect supervision training set.
  
 
== Baseline & Results ==  
 
== Baseline & Results ==  
  
 
== Related Work ==
 
== Related Work ==

Revision as of 18:44, 29 September 2011

Citation

Structured Output Learning with Indirect Supervision, by M.W. Chang, V.Srikumar, D.Goldwasser, D.Roth. In Proceedings of the 27th International Conference on Machine Learning, 2010.

This Paper is available online [1].

Summary

The problem with a lot of structured output problems is that is often time-consuming and difficult to obtain labelings and annotated structures for training. For example, one may need to obtain a parse tree, a POS tag sequence, or a translation for every given sentence in your training data. Naturally, this becomes more onerous the larger your training set. The key idea of this paper is that often, structured output problems have a companion learning problem which is to determine whether a given structure is legitimate or not. For example, for POS tagging, the binary companion problem is to determine whether the POS tag sequence for a given input sentence is valid or not. The basic assumption (which seems quite realistic to me) is that "direct supervision", which in this case would mean the ground truth tag sequence, is difficult and cumbersome to obtain, obtaining binary annotations (good/bad, legitimate/illegitimate) is far easier.

This paper presents a large margin-based framework that learns on both fully supervised (i.e., traditional supervised training) data as well as binary "indirectly supervised" training data. Experiments show that binary data helps in improving performance on three different NLP structure learning problems.

Main Approach

The authors present their approach, which they call "Joint Learning with Indirect Supervision", or J-LIS, as a generalization of the structured output | SVMs. The basic idea is to learn from a small number of fully or directly supervised training examples, and augment training with indirectly supervised (binary labels) training data.

The goal of learning in standard structured output prediction is to find a weight vector w such that . In this case, x is the input, (x) is the set of all feasible structures with x as input, and is a feature generation function. The key assumption that is used so that we can incorporate indirect supervision is that an input x generates a valid output () if and only if its best structure is well-formed, and conversely an input x generates an invalid output () if every structure for that input is bad. In mathematical terms:

where refer to the positive and negative partitions of the indirect supervision training set.

Baseline & Results

Related Work