KeisukeKamataki writeup of Cohen whirl 2000

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Cohen_2000_whirl_a_word_based_information_representation_language by user:KeisukeKamataki.

Summary: This paper proposes WHIRL (Word-based Heterogeneous Information Representation) which combines logic-based ad text-based representation methods in order to integrate the data of heterogeneous(soft) information source (like web info). Such integrated information can be used for ranked retrieval of documents or processing queries as hard information for efficient access to the database. This method also work for many domains.

In WHIRL each entity of text is represented in vector space. This helps clarifying the importance of each term like TF-IDF idea and the simple similarity measurement of entities.

For query processing, it supports conjunctive queries. This method makes use of EDB, a collection of ground atmic facts with a kind of degree of belief. Since this method considers cosine similarity and product of the score of each literal, it seems to be effective when the high precision is required.

I like: Not only the system introduction, but also this paper refers to efficiency and open-domain effectiveness. Although not only this approach, but also many large scale text analysis tends to be suffered from scalability of inference issue. The idea of making use of traditional method like A* search with IR indexing technique and taking "snapshot" of each domain seems reasonable approach for such issues.