http://curtis.ml.cmu.edu/w/courses/index.php?title=Training_SMT_Systems_with_the_Latent_Structured_SVM&feed=atom&action=historyTraining SMT Systems with the Latent Structured SVM - Revision history2024-03-28T20:56:16ZRevision history for this page on the wikiMediaWiki 1.33.1http://curtis.ml.cmu.edu/w/courses/index.php?title=Training_SMT_Systems_with_the_Latent_Structured_SVM&diff=9149&oldid=prevJmflanig: /* Proposal */2011-10-19T04:00:57Z<p><span dir="auto"><span class="autocomment">Proposal</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 04:00, 19 October 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l9" >Line 9:</td>
<td colspan="2" class="diff-lineno">Line 9:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>) used the structured perceptron to train weights for each phrase in a phrase-based system as well as features shared between phrases. The approach can be viewed as an instance of the Latent Structured SVM ([http://www.cs.cornell.edu/~cnyu/papers/icml09_latentssvm.pdf Yu & Joachims ICML 2009]) but with no regularizer and no cost function. Regularization is shown to be important in discriminative training of SMT systems ([http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.164.9399&rep=rep1&type=pdf Blumsom]). We propose to generalize the perceptron training of SMT systems to the Latent SSVM to allow for a regularizer and cost function, and to apply the method to large-scale training of systactic SMT systems as well as a phrase-based system.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>) used the structured perceptron to train weights for each phrase in a phrase-based system as well as features shared between phrases. The approach can be viewed as an instance of the Latent Structured SVM ([http://www.cs.cornell.edu/~cnyu/papers/icml09_latentssvm.pdf Yu & Joachims ICML 2009]) but with no regularizer and no cost function. Regularization is shown to be important in discriminative training of SMT systems ([http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.164.9399&rep=rep1&type=pdf Blumsom]). We propose to generalize the perceptron training of SMT systems to the Latent SSVM to allow for a regularizer and cost function, and to apply the method to large-scale training of systactic SMT systems as well as a phrase-based system.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Our original project was to incorporate binary feedback into MT systems, but we arrived at the current proposal after we realized nobody had tried this important training method. So if we have time we may try to extend our latent SSVM model to the recently introduced Structured Output Learning with Indirect Supervision, [http://www.icml2010.org/papers/522.pdf M. Chang et al, ICML 2010].</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">(''Note'': </ins>Our original project was to incorporate binary feedback into MT systems, but we arrived at the current proposal after we realized nobody had tried this important training method. So if we have time we may try to extend our latent SSVM model to the recently introduced Structured Output Learning with Indirect Supervision, [http://www.icml2010.org/papers/522.pdf M. Chang et al, ICML 2010].<ins class="diffchange diffchange-inline">)</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Dataset(s) == </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Dataset(s) == </div></td></tr>
</table>Jmflanighttp://curtis.ml.cmu.edu/w/courses/index.php?title=Training_SMT_Systems_with_the_Latent_Structured_SVM&diff=9147&oldid=prevJmflanig: Created page with '(was '''Improving SMT word alignment with binary feedback''') == Team Member(s) == * Avneesh Saluja * Jeff Flanigan == Proposal == Large-sca…'2011-10-19T03:57:59Z<p>Created page with '(was '''Improving SMT word alignment with binary feedback''') == Team Member(s) == * <a href="/w/courses/index.php/User:Asaluja" title="User:Asaluja">Avneesh Saluja</a> * <a href="/w/courses/index.php/User:Jmflanig" title="User:Jmflanig"> Jeff Flanigan</a> == Proposal == Large-sca…'</p>
<p><b>New page</b></p><div>(was '''Improving SMT word alignment with binary feedback''')<br />
<br />
== Team Member(s) ==<br />
* [[User:Asaluja|Avneesh Saluja]]<br />
* [[User:Jmflanig| Jeff Flanigan]]<br />
<br />
== Proposal ==<br />
Large-scale discriminative training of MT systems has been a long standing goal in statistical machine translation. One of the first attempts ([http://cs.stanford.edu/~pliang/papers/discriminative-mt-acl2006.pdf Laing et al 2006]<br />
) used the structured perceptron to train weights for each phrase in a phrase-based system as well as features shared between phrases. The approach can be viewed as an instance of the Latent Structured SVM ([http://www.cs.cornell.edu/~cnyu/papers/icml09_latentssvm.pdf Yu & Joachims ICML 2009]) but with no regularizer and no cost function. Regularization is shown to be important in discriminative training of SMT systems ([http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.164.9399&rep=rep1&type=pdf Blumsom]). We propose to generalize the perceptron training of SMT systems to the Latent SSVM to allow for a regularizer and cost function, and to apply the method to large-scale training of systactic SMT systems as well as a phrase-based system.<br />
<br />
Our original project was to incorporate binary feedback into MT systems, but we arrived at the current proposal after we realized nobody had tried this important training method. So if we have time we may try to extend our latent SSVM model to the recently introduced Structured Output Learning with Indirect Supervision, [http://www.icml2010.org/papers/522.pdf M. Chang et al, ICML 2010].<br />
<br />
== Dataset(s) == <br />
We will primarily use one dataset for the purposes of this project, which is the [http://www.mt-archive.info/IWSLT-2009-Paul.pdf IWSLT 2009 Chinese-English btec task parallel corpus]. <br />
* The training set is > 500,000 parallel sentences. <br />
* There are 9 development (tuning) sets, each with ~500 sentences (total of 4,250 sentences)<br />
* The test set consists of 200 aligned sentences<br />
Of course, we can always decide to use one of the tuning sets as a test set and vice versa. <br />
<br />
== Baseline System == <br />
The baseline systems will be a phrase-based system and a Hiero system, optimized using MERT with gamut of usual features.<br />
<br />
== Related Work ==<br />
* An end-to-end discriminative approach to machine translation, [http://cs.stanford.edu/~pliang/papers/discriminative-mt-acl2006.pdf Laing et al 2006]<br />
<br />
* Learning Structural SVMs with Latent Variables, [http://www.cs.cornell.edu/~cnyu/papers/icml09_latentssvm.pdf Yu & Joachims ICML 2009]</div>Jmflanig