Turney 2006 A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations, COLING 2008

From Cohen Courses
Jump to navigationJump to search

Citation

Turney, P.D. (2008), A uniform approach to analogies, synonyms, antonyms, and associations, Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), Manchester, UK, pp. 905-912.

Online version

Link

Summary

In this paper, the authors suggest a supervised corpus-based method to classifying relations such as analogies, synonyms, etc. They use Support Vector Machines, using patterns the given word pair occurs in as features. This method achieved competitive results against the existing method that are designed for specific tasks (synonym only, etc.).

Brief description of the method

For any word pair X:Y, it extracts all the patterns "(0~1 words) X (0~3 words) Y (0~1 words)". Then the system generates more patterns out of it by marking some of the words (excluding X and Y) with asterisks. This generates patterns when one original pattern of length n is found. To keep the number of features to a manageable size, the authors use only the top patterns when sorted in an decreasing order by the number of word pairs occurred in each pattern (N is the number of word pairs, and k is a constant). This came from an intuition that patterns shared by many pairs are more useful. Once the feature selection is done, feature vectors for each pair is generated by taking (f is the frequency of a pattern) and then normalizing each column of the feature vectors.

Experimental Result

The proposed method definitely outperformed the baseline methods in the following task settings. However, it did not outperformed the best existing methods. However, it is worth to note that this method can be easily applied to any task with similar settings, whereas the best existing methods are specifically for that task only. It is the range of the tasks this method can handle, not the performance.

SAT Analogies

In this dataset, one word pair is given and 4 word pairs with unknown relations are given. For each question, it just generates one positive example and one negative example. The positive example used is the word pair given in the question (called stem). And the authors picked a random word pair shown in other problems, assuming that they don't represent the same analogy. Due to the lack of the examples, the authors used a form of bootstrap aggregating by repeating the process several times by picking different negative examples. Note that it is not possible to generate other positive examples.

TOEFL Synonyms

For this type of questions, one word is given in the question and 4 other words are given. Then for questions, we can generate positive example word pairs and negative examples. 90% of the pairs are used in the training set.

Synonyms & Antonyms, Similar & Associated & Both

These are the simplest form of the application. The authors just classified which class a given pair belongs to. Note that in the second task (similar & associated & both) it needs 3 classes to be compared.

Turney COLING2008.png

Related papers