Sgardine writesup Bunescu 2006 subsequence kernels

From Cohen Courses
Jump to navigationJump to search

This will be a review of Bunescu_2006_subsequence_kernels_for_relation_extraction by user:Sgardine.

Summary

Protein-protein interaction extraction from biological corpora is a difficult problem partially because many useful features rely on POS taggers and parsing approaches which perform less well on biological data. Previous approaches were able to achieve some success using subsequence rules involving two protein names; the authors propose to use all anchored subsequences. Because enumerating all such is infeasible, the authors propose a kernel for calculating the dot-product in the implicit space efficiently. They describe how to compute the kernel using some recurrence relations; given the kernel they train SVM. They train the model on protein-protein interaction data and find it to outperform previous systems of hand- and greedily-selected rules. They also evaluate the model on Relation Extraction on the ACE corpus, where it outperforms SVM using a different kernel.

Commentary

What about ACE with the sum of ERK and K4?

I understand how it would happen when you're discussing alphabets and summations that you'd end up overloading the Σ character, but I don't have to like it.