Difference between revisions of "Detection of Ad Hominem attacks in blog and review data"

From Cohen Courses
Jump to navigationJump to search
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
== Comments ==
 +
 +
This is a nice idea.  It might be interesting to look at a labeled sentiment dataset, like the [http://verbs.colorado.edu/jdpacorpus/ JD Powers corpus], and build a sentiment-target detector, and see if combining this with a list of pronouns or a name recognizer would do a good job at recognizing ad hominem attacks. 
 +
 +
[http://www.truthmapping.com/] has structured debates - I think they include ad hominem as an label that can be added to a claim.  I'm not sure if it's useful as a datasource, but you might look at it.
 +
 +
--[[User:Wcohen|Wcohen]] 20:23, 10 October 2012 (UTC)
 +
 +
=== Response ===
 +
 +
I have contacted the site admin at [http://truthmapping.com/], and they've responded that there aren't many examples of labeled ad hominem in their database currently, providing a list of the few examples where it has been used. I will take a look at the JD Powers dataset; doing sentiment-target detection was what I had in mind for one of the approaches.
 +
 +
--[[User:Gmontane|Gmontane]] 18:35, 15 October 2012 (UTC)
 +
 
==Task==
 
==Task==
Use machine-learning and/or probabilistic topic modeling to detect examples of personal insult in blog and product review data.
+
Use machine-learning and/or probabilistic topic modeling to detect examples of personal insult in blog and product review data. This is a form of [[AddressesProblem::opinion mining]].
  
 
== Overview ==
 
== Overview ==
Line 11: Line 25:
 
* A dataset, from the [http://www.kaggle.com/c/detecting-insults-in-social-commentary Kaggle.com] "Detecting Insults in Social Commentary" competition, consisting of 1,050 insult comments and 2,898 neutral comments.
 
* A dataset, from the [http://www.kaggle.com/c/detecting-insults-in-social-commentary Kaggle.com] "Detecting Insults in Social Commentary" competition, consisting of 1,050 insult comments and 2,898 neutral comments.
 
* Collection of 30,771 blog documents from blogs discussing evolution and anti-evolution. (Unlabeled)
 
* Collection of 30,771 blog documents from blogs discussing evolution and anti-evolution. (Unlabeled)
* Collection of 294 hand-labeled product review sentences, classified for sentiment (pos/neg/neutral).
+
* Collection of over 3,000 hand-labeled sentences from 294 product reviews, classified for sentiment (pos/neg/neutral).
* Bo Pang and Lillian Lee's data from "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts'', which consists of 2000 multi-sentence movie reviews, labeled as positive and negative.
+
* Amazon product data [http://liu.cs.uic.edu/download/data/ http://liu.cs.uic.edu/download/data/].
  
 
==Baseline Method==
 
==Baseline Method==
Given the presence of labeled data, simple logistic regression or naive Bayes classification on a bag-of-words representation will be used to predict whether a sentence is "insulting" (a personal attack) or not. Qualitative analysis will be performed for predictions on unlabeled blog data and semi-labeled review data.  
+
Given the presence of labeled data, simple logistic regression or naive Bayes classification on a bag-of-words representation will be used to predict whether a sentence is "insulting" (a personal attack) or not.
  
 
==Proposed Method==
 
==Proposed Method==
 
I propose combining a search for negative sentiment within a sentence with a method of detecting whether the target of a sentence is a person as a proxy for ad hominem (negative sentiment aimed at persons, not ideas). In addition, I would like to try machine learning based on more advanced features, such as part-of-speech tags and inferred topic models, to build additional classifiers.
 
I propose combining a search for negative sentiment within a sentence with a method of detecting whether the target of a sentence is a person as a proxy for ad hominem (negative sentiment aimed at persons, not ideas). In addition, I would like to try machine learning based on more advanced features, such as part-of-speech tags and inferred topic models, to build additional classifiers.
  
==Evaluation===
+
==Evaluation==
 
* Quantitative analysis will be performed using the labeled test data. We will compute precision and recall scores.
 
* Quantitative analysis will be performed using the labeled test data. We will compute precision and recall scores.
* Qualitative analysis will be performed by running the classification algorithms on the unlabeled data, and looking at the examples of text labeled as "insult/ad hominem" by the classifiers.  
+
* Qualitative analysis will be performed by running the classification algorithms on the unlabeled data, and looking at the examples of text labeled as "insult/ad hominem" by the classifiers.
  
 
==Challenges==
 
==Challenges==
Line 33: Line 47:
  
 
== Related Work ==
 
== Related Work ==
 +
* E Spertus, "[http://www.cs.csustan.edu/~mmartin/LDS/Spertus.pdf Smokey: Automatic recognition of hostile messages]", Proceedings of the National Conference on Artificial Intelligence, 1997
 +
* A Razavi, D Inkpen, S Uritsky, S Matwin, "[http://www.site.uottawa.ca/~diana/publications/Flame_Final.pdf Offensive language detection using multi-level classification]", Advances in Artificial Intelligence, 2010
 +
* [[Turney, ACL 2002]]
 +
* G Xiang, B Fan, L Wang, J Hong, and C Rose, "[http://cmuchimps.org/publications/118-detecting-offensive-tweets-via-topical-feature-discovery-over-a-large-scale-twitter-corpus Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus]", Conference on Information and Knowledge Management (CIKM), 2012

Latest revision as of 16:39, 3 November 2012

Comments

This is a nice idea. It might be interesting to look at a labeled sentiment dataset, like the JD Powers corpus, and build a sentiment-target detector, and see if combining this with a list of pronouns or a name recognizer would do a good job at recognizing ad hominem attacks.

[1] has structured debates - I think they include ad hominem as an label that can be added to a claim. I'm not sure if it's useful as a datasource, but you might look at it.

--Wcohen 20:23, 10 October 2012 (UTC)

Response

I have contacted the site admin at [2], and they've responded that there aren't many examples of labeled ad hominem in their database currently, providing a list of the few examples where it has been used. I will take a look at the JD Powers dataset; doing sentiment-target detection was what I had in mind for one of the approaches.

--Gmontane 18:35, 15 October 2012 (UTC)

Task

Use machine-learning and/or probabilistic topic modeling to detect examples of personal insult in blog and product review data. This is a form of opinion mining.

Overview

This project is aimed at detecting ad hominem attacks and personal insults in blog data and product review data. Personal insults consist of attacking people rather than ideas or features of a product. The task is challenging due to the subjective nature of verbal attack, but previous work has been done in this area, showing that at least some progress is possible on this task.

Team

George Montañez

Datasets

  • A dataset, from the Kaggle.com "Detecting Insults in Social Commentary" competition, consisting of 1,050 insult comments and 2,898 neutral comments.
  • Collection of 30,771 blog documents from blogs discussing evolution and anti-evolution. (Unlabeled)
  • Collection of over 3,000 hand-labeled sentences from 294 product reviews, classified for sentiment (pos/neg/neutral).
  • Amazon product data http://liu.cs.uic.edu/download/data/.

Baseline Method

Given the presence of labeled data, simple logistic regression or naive Bayes classification on a bag-of-words representation will be used to predict whether a sentence is "insulting" (a personal attack) or not.

Proposed Method

I propose combining a search for negative sentiment within a sentence with a method of detecting whether the target of a sentence is a person as a proxy for ad hominem (negative sentiment aimed at persons, not ideas). In addition, I would like to try machine learning based on more advanced features, such as part-of-speech tags and inferred topic models, to build additional classifiers.

Evaluation

  • Quantitative analysis will be performed using the labeled test data. We will compute precision and recall scores.
  • Qualitative analysis will be performed by running the classification algorithms on the unlabeled data, and looking at the examples of text labeled as "insult/ad hominem" by the classifiers.

Challenges

  • The subjective nature of personal attack makes this task difficult. Humans can disagree on whether a sentence is insulting or not.
  • The labeled insult data is noisy (looking over it). Some insults are not marked as such in the data, so the task may be more difficult because of this.
  • The primary challenge of securing labeled insult data has already been met.

Learning Objectives

The hope is to develop automated methods of identifying such complicated objects as ad hominem attacks in text. I would like to expand my knowledge of sentiment analysis methods. Furthermore, I am interested in seeing the number of sentences (and types of sentences) identified as insults in the blog data.

Related Work