Detection of Ad Hominem attacks in blog and review data

From Cohen Courses
Jump to navigationJump to search

Comments

This is a nice idea. It might be interesting to look at a labeled sentiment dataset, like the JD Powers corpus, and build a sentiment-target detector, and see if combining this with a list of pronouns or a name recognizer would do a good job at recognizing ad hominem attacks.

[1] has structured debates - I think they include ad hominem as an label that can be added to a claim. I'm not sure if it's useful as a datasource, but you might look at it.

--Wcohen 20:23, 10 October 2012 (UTC)

Response

I have contacted the site admin at [2], and they've responded that there aren't many examples of labeled ad hominem in their database currently, providing a list of the few examples where it has been used. I will take a look at the JD Powers dataset; doing sentiment-target detection was what I had in mind for one of the approaches.

--Gmontane 18:35, 15 October 2012 (UTC)

Task

Use machine-learning and/or probabilistic topic modeling to detect examples of personal insult in blog and product review data. This is a form of opinion mining.

Overview

This project is aimed at detecting ad hominem attacks and personal insults in blog data and product review data. Personal insults consist of attacking people rather than ideas or features of a product. The task is challenging due to the subjective nature of verbal attack, but previous work has been done in this area, showing that at least some progress is possible on this task.

Team

George Montañez

Datasets

  • A dataset, from the Kaggle.com "Detecting Insults in Social Commentary" competition, consisting of 1,050 insult comments and 2,898 neutral comments.
  • Collection of 30,771 blog documents from blogs discussing evolution and anti-evolution. (Unlabeled)
  • Collection of over 3,000 hand-labeled sentences from 294 product reviews, classified for sentiment (pos/neg/neutral).
  • Amazon product data http://liu.cs.uic.edu/download/data/.

Baseline Method

Given the presence of labeled data, simple logistic regression or naive Bayes classification on a bag-of-words representation will be used to predict whether a sentence is "insulting" (a personal attack) or not.

Proposed Method

I propose combining a search for negative sentiment within a sentence with a method of detecting whether the target of a sentence is a person as a proxy for ad hominem (negative sentiment aimed at persons, not ideas). In addition, I would like to try machine learning based on more advanced features, such as part-of-speech tags and inferred topic models, to build additional classifiers.

Evaluation

  • Quantitative analysis will be performed using the labeled test data. We will compute precision and recall scores.
  • Qualitative analysis will be performed by running the classification algorithms on the unlabeled data, and looking at the examples of text labeled as "insult/ad hominem" by the classifiers.

Challenges

  • The subjective nature of personal attack makes this task difficult. Humans can disagree on whether a sentence is insulting or not.
  • The labeled insult data is noisy (looking over it). Some insults are not marked as such in the data, so the task may be more difficult because of this.
  • The primary challenge of securing labeled insult data has already been met.

Learning Objectives

The hope is to develop automated methods of identifying such complicated objects as ad hominem attacks in text. I would like to expand my knowledge of sentiment analysis methods. Furthermore, I am interested in seeing the number of sentences (and types of sentences) identified as insults in the blog data.

Related Work