Project dong, 10-802 spring 2010

From Cohen Courses
Revision as of 17:01, 1 February 2011 by Wcohen (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This page is a project report.

  • Titel: Analyzing perspectives in an interactive setting
  • Author: Dong Nguyen

Summary

This project analyzed how perspectives are displayed in text. We used political discussion data and looked at 'left' versus 'right'. First, experiments were done to compare different methods to estimate the bias in text. Then one of these methods was used to analyze the influence of interaction on perspectives in text in an online political forum.

Perspectives in text

The perspective of a speaker or author influences the text or speech he produces. A well-known example is the use of 'freedom fighter' or 'terrorist'. Estimating from which perspective a text is written is a difficult problem, since text is often on the same topic. The differences are therefore often very subtil, which makes this a hard problem.

Potential applications

  • Estimating voting behavior of political persons
  • Track political opinion
  • Diversify search results (return documents written in different perspectives about topics of interest)
  • Personalize search results (return documents in viewpoint of user)
  • Etc..

Related work

There has been a variety of work on perspectives in text.

The following is an overview of techniques using machine learning techniques

The following have looked at interaction patterns (such as quoting behavior)

The following are some linguistic papers on this topic

  • Discourse semantics and ideology., van Dijk, 1995
  • Ideology and discourse: a multidisciplinary introduction., van Dijk, 2003
  • Intertextual borrowings in ideologically competing discourses: The case of the middle east, Kawakib Momani, 2010

Datasets

We experimented with the following datasets

Another interesting dataset which has not been used in this project but which is very related is the Bitterlemons dataset