Gabrilovich and Markovitch IJCAI 2007
Citation
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence, 1606-1611.
Online version
Summary
The paper presents a system for computing Semantic Relatedness.
The dataset used for evaluation is WordSimilarity-353 collection, and a collection 50 documents from the Australian Broadcasting Corporation's news mail service. WordSimilarity-353 collection has 353 pairs of words, and the other collection has 1,225 pairs of documents. Both have human judgments as gold standards.
They propose a method, called Explicit Semantic Analysis, which represents the meaning of any text in terms of natural concepts defined on large-scale knowledge repository such as Wikipedia [1] and Open Directory Project (ODP) [2] . They build a semantic interpreter that maps fragments of natural language text into a weighted sequence of concepts ordered by their relevance to the input. Input texts are represented as weighted vectors of concepts. Then, the semantic relatedness is calculated comparing their vectors, for instance, using the cosine metric. The method is described in the picture below.
The methodology introduced shows substantial improvements over other ones as below. It gives 0.75 of correlation with humans in system performance.
Key Contribution
The algorithm suggested in this paper is intuitive and simple, and shows good performance. Moreover, it can compute semantic relatedness between words and semantic relatedness between texts using the same method without any change while some other methods on Semantic Relatedness target either words or texts. This paper is cited not only by other following papers on semantic relatedness but by papers on other information extraction researches using wiki resources.