Rbosaghz writeup of Dan Roth Talk
Roth_2009_Talk by user:Rbosaghz
Constrained Conditional Models
Of the major points of interest in this talk was the difference between features and constraints. While mathematically equivalent in most models, the difference between features and constraints was highlighted as local versus global, respectively. Prof Roth started with several examples of learning problems where local features were not enough and a more global perspective was required. For example, in a picture were presented several people, but their body parts were hidden behind one another, making it difficult for a vision learner to figure out the number of people in the picture, because it was not enough to look for specific body parts (e.g. faces alone). One had to look at all available information in the picture to realize there were 5 people present. Similar problems arise in natural language, and more examples were shown for the semantic role learning task.
In some cases to respect the constraints, it was necessary to turn some optimization problems into Integer Linear Programs (ILPs), where the objective was to minimize the contradictions caused by the constraints. These ILPs were solved using commercial solvers, and while being an NP-Hard task, were still tractable at the levels Prof. Roth was using. I particularly liked this approach as it can be clearly extended to other tasks where ILP can help enforce global constraints.
By applying constraints during inference, Prof. Roth presented examples where constraints can improve performance in tasks such as semantic role learning,information extraction tasks, and transliteration. Prof. Roth also pointed out the benefits and drawbacks of using pipelines in NLP tasks, and the benefits gained from merging two consecutive steps of the pipeline.
What I specifically liked about this talk was that although we may think global features are almost always going to make inference take more time, in the the case of some of the models shown in this talk, adding constraints made inference two orders of magnitude faster, while improving quality.