Distributional Similarity

From Cohen Courses
Jump to navigationJump to search

Method

A simple way to gauge the semantic relatedness of two words from a large unlabeled corpus.

A word's or phrases distributional similarity to another word or phrase is a function of the surrounding contexts of each time it appears. Sometimes contexts are defined as syntactic dependencies; sometimes they are left- and right neighbors. (In this case, distributional similarity is superficially similar to but NOT the same as co-occurrence statistics between the words). Dist. sim. measures to what extent the words/phrases can be similar slot-fillers. For example, here's a quick example thrown together from some twitter data, comparing three words: "great", "good", and "bad".

Picture 2.png

A context is usually defined as the tuple of several words immediately to the left and right of the word. You then look at the word's many appearances in the corpus and count the contexts. Words that can be used in similar ways should get similar distribution of context appearances. Above, the columns are just normalized counts (probability of that context for the word). The hope is that words with similar sentiment polarity will have a high distributional similarity score. Indeed, in this example (with data from 2M occurrences, which is very small for this sort of thing), the cosine similarities give sim(great,good) > sim(great,bad), which is exactly what you hope.

Distributional similarity has been used to group together words with rough semantic similarities; it can be used to derive lexicons comparable to hand-created lexical resources -- this is what the Lin,_1998 does -- or extend them as seeds, as in Snow_et_al_2005. Distributional similarity can be a useful feature inside discriminatively trained NLP systems like CRFs for named entity recognition.