Difference between revisions of "Gabrilovich and Markovitch IJCAI 2007"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. IJCAI'07 Proceedings of the 20th …')
 
Line 9: Line 9:
 
== Summary ==
 
== Summary ==
  
The [[Category::paper]] presents how they built The Proposition Bank ([[PropBank]]) corpus. In addition, the paper describes an automatic system for [[AddressesProblem::Semantic Role Labeling]] trained on the corpus.
+
The [[Category::paper]] presents a system for computing [[AddressesProblem::Semantic Relatedness]].  
  
For automatic determination of semantic role labels, they adopted the features and probability model of [[Gildea and Jurafsky Computational Linguistics 2002]] for their initial experiments. While [[Gildea and Jurafsky Computational Linguistics 2002]] do not have a gold standard of parse tree, they do have a gold standard of parse trees, and they show improvements in the performance of the system.  
+
The dataset used for evaluation is [[UsesDataset::WordSimilarity-353]] collection, and a collection 50 documents from the Australian Broadcasting Corporation's news mail service. WordSimilarity-353 collection has 353 pairs of words, and the other collection has 1,225 pairs of documents. Both have human judgments as gold standards.
  
Features used for the system are the phrase type, the parse tree path, the position, the voice, and the head word.
+
They propose a method, called [[UsesMethod::Explicit Semantic Analysis]], which represents the meaning of any text in terms of natural concepts defined on large-scale knowledge repository such as Wikipedia [http://www.wikipedia.com] and Open Directory Project (ODP) [http://www.dmoz.org] . After building a semantic interpreter that maps fragments of natural language text into a weighted sequence of Wikipedia concepts ordered by their relevance to the input. Then, input texts are represented as weighted vectors of concepts. Then, the semantic relatedness is calculated comparing their vectors, for example, using the cosine metric.The method is described in the picture below.
 +
 
 +
[[File:semanticRelatedness_Wikipedia.png]]
 +
 
 +
The methodology introduced shows substantial improvements over other ones as below.
 +
 
 +
[[File:semanticRelatedness_result.png]]
 +
 
 +
 
 +
Their system can compute relatedness between words, sentences, and texts  with the same method while some other researches on Semantic Relatedness limit the target as one of three.
  
 
== Key Contribution ==  
 
== Key Contribution ==  
 
This system is the first statistical model on FrameNet solving the semantic role labeling problem, and future systems use the features introduced in this paper as a baseline. This paper is also very worth to read in that it describes the whole process of semantic role labeling in detail. In addition, they did many various experiments to find out which features, algorithms, and techniques affect the performance of the system.
 
This system is the first statistical model on FrameNet solving the semantic role labeling problem, and future systems use the features introduced in this paper as a baseline. This paper is also very worth to read in that it describes the whole process of semantic role labeling in detail. In addition, they did many various experiments to find out which features, algorithms, and techniques affect the performance of the system.

Revision as of 20:34, 30 November 2010

Citation

Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence, 1606-1611.

Online version

AAAI

Summary

The paper presents a system for computing Semantic Relatedness.

The dataset used for evaluation is WordSimilarity-353 collection, and a collection 50 documents from the Australian Broadcasting Corporation's news mail service. WordSimilarity-353 collection has 353 pairs of words, and the other collection has 1,225 pairs of documents. Both have human judgments as gold standards.

They propose a method, called Explicit Semantic Analysis, which represents the meaning of any text in terms of natural concepts defined on large-scale knowledge repository such as Wikipedia [1] and Open Directory Project (ODP) [2] . After building a semantic interpreter that maps fragments of natural language text into a weighted sequence of Wikipedia concepts ordered by their relevance to the input. Then, input texts are represented as weighted vectors of concepts. Then, the semantic relatedness is calculated comparing their vectors, for example, using the cosine metric.The method is described in the picture below.

SemanticRelatedness Wikipedia.png

The methodology introduced shows substantial improvements over other ones as below.

SemanticRelatedness result.png


Their system can compute relatedness between words, sentences, and texts with the same method while some other researches on Semantic Relatedness limit the target as one of three.

Key Contribution

This system is the first statistical model on FrameNet solving the semantic role labeling problem, and future systems use the features introduced in this paper as a baseline. This paper is also very worth to read in that it describes the whole process of semantic role labeling in detail. In addition, they did many various experiments to find out which features, algorithms, and techniques affect the performance of the system.