Sgardine project abstract
Contents
What I plan to do with what data
I am thinking to follow the lead of Mayfield, McNamee, Platko 2003 and explore margin-based sequence classification using lattices. For variety and simplicity, I'll use perceptron instead of SVM. I plan to at least implement the perceptron and apply it to some sequence data, generally corroborating their results. I hope to also explore a couple ways of producing probabilities from margins, as per Platt (1999) and later comments thereon.
I am looking for labeled sequence data with a small number of tags (to limit the number of models I'll need to train); the data I am currently planning to use is the CoNLL-2002 Shared Task NER data (in Spanish and Dutch) and possibly the UseNet FAQ data
Why I think it's interesting
I like the model's combination of psychologically natural transition decisions with representation of the instances of feature vectors. Also, I want to get my hands dirty with applying large-margin classification to NL tasks.
Evaluation
I plan to use the standard evaluation metrics of Precision, Recall and F1; the ConLL 2002 includes an evaluation script. I'll implement an HMM as a baseline, and I'll gesture at published state-of-the-art values for the tasks.
Whom I might work with
I am currently approaching the project alone, but would welcome interested collaborators on this or a similar project.