Difference between revisions of "Miwa 2009 a rich feature vector for protein protein interaction extraction from multiple corpora"

From Cohen Courses
Jump to navigationJump to search
Line 12: Line 12:
  
 
== Brief description of the method ==
 
== Brief description of the method ==
 +
The target task of the system is a sentence-based,[[File:Miwa_fig1.JPG|frame|Figure 1: Overview of PPI Extraction System]] pair-wise PPI extraction which is formulated as a classification problem that judges whether a given pair of proteins in a sentence is interacting or not. Figure 1 shows the overview of the proposed PPI extraction system. As a classifier using a single corpus, the 2-norm soft-margin linear SVM (L2-SVM) classifier was used, with the dual coordinate decent (DCD) method.
  
 
== Experimental Result ==
 
== Experimental Result ==
  
 
== Related papers ==
 
== Related papers ==

Revision as of 20:26, 30 November 2011

Citation

A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora, by M. Miwa, R. S\aetre, Y. Miyao, J. Tsujii. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009.

Online Version

Here is the online version of the paper.

Summary

Because of the importance of protein protein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions of proteins and PPI. Since no single corpus is large enough to saturate a machine learning system, it is necessary to learn from multiple different corpora. In this paper the authors propose the extraction of PPIs from multiple different corpora. They design a rich feature vector, and as an Inductive Transfer Learning (ITL) method, apply a support vector machine (SVM) modified for corpus weighting (SVM-CW), in order to evaluate the use of multiple corpora for the PPI extraction task. The authors show that the system with their feature vector was better than or at least comparable to the state-of-the-art PPI extraction systems on every corpus. While SVM-CW is simple, SVM-CW can improve the performance of the system more effectively and more efficiently than other methods proven to be successful in other NLP tasks earlier.

Brief description of the method

The target task of the system is a sentence-based,

Figure 1: Overview of PPI Extraction System

pair-wise PPI extraction which is formulated as a classification problem that judges whether a given pair of proteins in a sentence is interacting or not. Figure 1 shows the overview of the proposed PPI extraction system. As a classifier using a single corpus, the 2-norm soft-margin linear SVM (L2-SVM) classifier was used, with the dual coordinate decent (DCD) method.

Experimental Result

Related papers