Learning Multilingual Subjective Language via Cross-Lingual Projections

From Cohen Courses
Jump to navigationJump to search

Citation

Learning Multilingual Subjective Language via Cross-Lingual Projections, Rada Mihalcea, Carmen Banea and Janyce Wiebe, In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 976–983

Online Version

The online version of this paper is here[1].

Summary

In this paper, two approaches were described to generating resources for subjectivity annotations for a new language, by leveraging on resources and tools available for English.

  • To builds a target language subjectivity lexicon by translating an existing English lexicon using a bilingual dictionary.
  • Generates a subjectivity-annotated corpus in a target language by projecting annotations from an automatically annotated English corpus.

Since neither of the two ways uses target language specifications, these approaches can be theoretically used in any target languages. The results shows that the second way preserves subjectivity more reliably than the first one.

Evaluation

In this work, both manual and automatic method is used to evaluate the task. They perform an agreement evaluation on the data set of 173 sentences in the automatically annotated corpus and at the same time, they build a classifier based on the automatically annotated corpus and evaluate its performance. Both evaluation give positive feedbacks on their approach. In the an overall best result of 67.85 F-measure in the machine learning approach.

Discussion

  • Strong: The paper use a target independent way to leverage resources in English for the target language and achieve a good results by using the approach. The evaluation given by the paper is quite careful and strong.
  • Weak: 1) The methods talked in this paper is mainly in lexicon level. Theoretically speaking, to outperform it, we may only consider to build a good subjective dictionary on the target language (which will not take too long). 2) Only using the projection method in the multilingual corpus, since we are restrained by the translation, we can not obtain many lexicons in target language. This will leave a large amount of resources (features) in the target language unused in this model.