Difference between revisions of "Park et al CSCW 2011. The Politics of Comments: Predicting Political Orientation of News Stories with Commenters’ Sentiment Patterns"
(16 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
== Summary == | == Summary == | ||
− | This [[Category::Paper]] tries to predict the political orientation of news articles by analyzing the sentiment patterns of commenters. It is difficult to interpret the political orientation of a news article by computation analysis of the text or metadata since they cover complex political discourse such as party, government, economy etc. This paper presents a new "social annotation analysis" approach of predicting the political orientation of news articles. | + | This [[Category::Paper]] tries to predict the political orientation of news articles by analyzing the sentiment patterns of commenters [[AddressesProblem::Sentiment analysis]]. It is difficult to interpret the political orientation of a news article by computation analysis of the text or metadata since they cover complex political discourse such as party, government, economy etc. This paper presents a new "social annotation analysis" approach of predicting the political orientation of news articles. |
== The main idea == | == The main idea == | ||
Line 31: | Line 31: | ||
[[File:consistency.jpg]] | [[File:consistency.jpg]] | ||
− | |||
==== Regularity of Sentiments ==== | ==== Regularity of Sentiments ==== | ||
Line 50: | Line 49: | ||
=== Single Commenter-based Prediction === | === Single Commenter-based Prediction === | ||
− | This method | + | This method predicts the political orientation by modelling an individual commenter as a multiclass [[UsesMethod:: Naive Bayes]] classifier. It determines the class C of an article as Liberal, Conservative, or Vague, given the sentiment of a comment S, which can be Positive, Negative, or Vague. |
<math> P(C|S) = (P(C) \times P(S|C))/ P(S) </math> | <math> P(C|S) = (P(C) \times P(S|C))/ P(S) </math> | ||
+ | |||
+ | The parameters, i.e., the prior P(C) and the class likelihood P(S|C), are trained from a sample of the comment history of the corresponding commenter. | ||
+ | |||
+ | ===Multi Commenter based prediction === | ||
+ | |||
+ | This method aggregates the identification results from each commenter. The aggregate methods used are | ||
+ | *Maximum Votes : This method aggregates the number of decisions made for each class and chooses the maximum. | ||
+ | *Maximum Posterior Probability : This method sums up the posterior probability P(C|S) for each class C that is used to make decisions in the single comment-based predictions. | ||
+ | |||
+ | == Evaluation == | ||
+ | The proposed methods for political view identification were evaluated in two folds. First the accuracy of prediction (proportion of correct prediction) were evaluated and secondly the article coverage of the methods (proportion of articles commented by the commenters) were determined. For comparison they developed a naive news text analysis method (using TF/IDF for feature extraction and SVM for classification). | ||
+ | |||
+ | === Evaluation of Single Commenter-based Prediction === | ||
+ | The Bayes classifier was trained using 20 comment article pairs and the test set for evaluation also included most recent 20 comment-article pairs of the corresponding commenter. Next the accuracy of the commenters were calculated and those commenters who had an accuracy over 70% were chosen since they are the predictive commenters(PC) who show high accuracy in their comments. | ||
+ | Next evaluations of the predictive commenters were done using two versions. In the first version, the sentiments of comments were manually analyzed (MA) and in the second version a simple sentiment classifier (SA) was used to classify the sentiment of the comments. The accuracy of these methods is compared to the text analysis method (TA). The figure below shows the accuracy achieved | ||
+ | |||
+ | [[File:acc.jpg]] | ||
+ | |||
+ | The proposed methods outperforms the TA method in both the general set and the popular set. The accuracy of SA is lower than MA as the simple sentiment classifier misidentifies the expressed sentiment for some comments. The accuracy is quite high when we just consider the Conservative and Liberal predictions. | ||
+ | |||
+ | Next the article coverage of the commenters is evaluated. The predictive commenters were found to cover nearly 40% of the articles of the set. Few articles were covered in the general set (around 5% and when considered more than 5 comments 21%). To increase the coverage commenters with accuracy scores between 60% and 70% were considered. This lowered the overall accuracy by 8% for conservative or liberal predictions. | ||
+ | |||
+ | === Evaluation of Multi Commenter-based Prediction === | ||
+ | |||
+ | Only the articles in the popular sets were selected since it aggregates the result of multiple commenters and benefits could be observed when many commenters comment on the same article. The method makes prediction by aggregating the results of the 43 commenters of the Popular Set. The accuracy was measured varying the minimum number of commenters required for the prediction, from 1 to 12. The accuracy of the two aggregation policies (Maximum Votes and Maximum Posterior Probability) are measured | ||
+ | |||
+ | [[File:macc3.jpg]] [[File:macc4.jpg]] | ||
+ | |||
+ | The accuracy of prediction increased as we increased the number of commenters. When only conservative or liberal predictions were considered the two aggregation methods achieved an accuracy over 80%. However the article coverage decreased as the number of commenters were increased. | ||
+ | |||
+ | == Discussion == | ||
+ | |||
+ | This paper considers the commenters sentiment pattern for predicting the political orientation of news articles. The performance seems promising to be used in real world applications like news recommendation systems. | ||
+ | The evaluation of the single commenter based prediction was done by using 20 comments per commenters which was not a very big training size. Performance would have been more clear if they would have considered more comments. But effectively the multi commenter based prediction approach is more useful for real world purpose since analyzing comments by multiple persons makes more sense | ||
+ | |||
+ | == Study Plan == | ||
+ | |||
+ | This paper is simple and readable. You might want to revise | ||
+ | |||
+ | [http://en.wikipedia.org/wiki/Naive_Bayes_classifier Naive Bayes Classifier] | ||
+ | |||
+ | [http://en.wikipedia.org/wiki/Cohen's_kappa Kappa metric] |
Latest revision as of 05:23, 11 January 2013
Contents
Citation
Souneil Park, Minsam Ko, Jungwoo Kim, Ying Liu, and Junehwa Song.“The Politics of Comments: Predicting Political Orientation of News Stories with Commenters’ Sentiment Patterns”, in Proceedings of the 2011 ACM Conference on Computer Supported Cooperative Work (CSCW 2011).
Online Version
Summary
This Paper tries to predict the political orientation of news articles by analyzing the sentiment patterns of commenters Sentiment analysis. It is difficult to interpret the political orientation of a news article by computation analysis of the text or metadata since they cover complex political discourse such as party, government, economy etc. This paper presents a new "social annotation analysis" approach of predicting the political orientation of news articles.
The main idea
Though it is a difficult problem to analyze the political orientation of a news article by computational analysis, however there exists commenters with clear political views and they are most likely to present the same views consistently towards various political issues. By identifying predictive commenters (who show a high degree of regularity in their sentiment patterns) and analyzing their sentiments of comments, the political orientation of the news article is deduced. When the comment is negative, the article’s political orientation can be predicted to be the opposite from that of the commenters; when the comment is positive, it can be predicted to be the same as that of the commenter.
Data and Analysis
An extensive study is conducted by choosing commenters and their comment history from Naver News, a popular Internet news portal in South Korea. The study meet the prerequisites of their assumption
- Existence of active commenters who continuously comment on a large amount of articles.
- Most of them have a clear political preference either as liberal or conservative
- Among them, there are predictive commenters.
Commenters from two article sets with different characterestics were sampled. The Popular Set is composed of a collection of the 20 most read political news articles of the day for a 6 month period. As the stories are popular, they have many comments. The General Set is sampled from the Naver political issue directory. The articles were sampled from major political issues that were updated from 2008.12 to 2009.11. The set includes both articles with many comments and those with few. The Naver ID's were designated as the identifiers of the commenters. Only the top level comments were considered. The figure below shows the data used.
Commenter's Political Orientation
The paper does several evaluations of whether the comenter's political preference can clearly be identified from their comments.
Consistency of Political Orientation
Analysis of 100 active (on both the set, Popular or General) commenters were done. Both the article sets covered major political issues. 20 recent comments were sampled from each commenter. A commenter was considered to show consistency when the political position expressed in all comment samples is consistent. Those who changed their position for at least once were tagged as “vague”. Figure below shows the result.
Regularity of Sentiments
Next they evaluate whether commenter's sentiments vary depending on the political orientation of news articles. An important point to consider here (which could be confusing) is that the political orientation of commenters doesnot guarantee the regularity of commenting behavior. There could be commenters whose comments always carry a negative sentiment regardless of the orientation of the topic. Two types of relationships were defined, Positve match and Negative match. Positive match is the case in which the sentiment of a comment is positive and the article’s political orientation is the same with that of the commenter’s. Negative match characterizes a relationship in which the sentiment of a comment is negative and the article’s political orientation is the opposite from that of the commenter. 20 recent comment-article pairs were taken from each of the 89 commenters. The sentiments of comments were identified as positive (compliments,endorsement, praise), negative (mockery, criticism etc), or vague (without any clear distinction). Similarly political orientation of articles were identified as liberal, conservative or vague. Two types of articles were classified as liberal: first, the articles that cover only liberal positions; second, the articles that cover information detrimental to the conservatives. The conservative articles also had two types, vice versa. These annotations were tested for reliability by recruiting another annotator and comparing the results. The kappa measure was 0.73 for the annotation of news articles and 0.67 for that of comments which were satisfactory.
The degree of regularity is calculated for each commenter.This is done by computing two conditional probabilities, the probability of Positive match given a positive sentiment and that of Negative match upon a comment with negative sentiment. The following figure shows the distribution of commenters according to their conditional probabilities
Also commenters were classified on the basis of their behaviour.
- Predictive: showing a high level of regularity
- Cross: Commenters who leave negative comments on the articles showing the same political orientation as their political orientation
- Opaque: Commenters whose comments are mostly vague
Predicting political orientation from sentiments
Single Commenter-based Prediction
This method predicts the political orientation by modelling an individual commenter as a multiclass Naive Bayes classifier. It determines the class C of an article as Liberal, Conservative, or Vague, given the sentiment of a comment S, which can be Positive, Negative, or Vague.
The parameters, i.e., the prior P(C) and the class likelihood P(S|C), are trained from a sample of the comment history of the corresponding commenter.
Multi Commenter based prediction
This method aggregates the identification results from each commenter. The aggregate methods used are
- Maximum Votes : This method aggregates the number of decisions made for each class and chooses the maximum.
- Maximum Posterior Probability : This method sums up the posterior probability P(C|S) for each class C that is used to make decisions in the single comment-based predictions.
Evaluation
The proposed methods for political view identification were evaluated in two folds. First the accuracy of prediction (proportion of correct prediction) were evaluated and secondly the article coverage of the methods (proportion of articles commented by the commenters) were determined. For comparison they developed a naive news text analysis method (using TF/IDF for feature extraction and SVM for classification).
Evaluation of Single Commenter-based Prediction
The Bayes classifier was trained using 20 comment article pairs and the test set for evaluation also included most recent 20 comment-article pairs of the corresponding commenter. Next the accuracy of the commenters were calculated and those commenters who had an accuracy over 70% were chosen since they are the predictive commenters(PC) who show high accuracy in their comments. Next evaluations of the predictive commenters were done using two versions. In the first version, the sentiments of comments were manually analyzed (MA) and in the second version a simple sentiment classifier (SA) was used to classify the sentiment of the comments. The accuracy of these methods is compared to the text analysis method (TA). The figure below shows the accuracy achieved
The proposed methods outperforms the TA method in both the general set and the popular set. The accuracy of SA is lower than MA as the simple sentiment classifier misidentifies the expressed sentiment for some comments. The accuracy is quite high when we just consider the Conservative and Liberal predictions.
Next the article coverage of the commenters is evaluated. The predictive commenters were found to cover nearly 40% of the articles of the set. Few articles were covered in the general set (around 5% and when considered more than 5 comments 21%). To increase the coverage commenters with accuracy scores between 60% and 70% were considered. This lowered the overall accuracy by 8% for conservative or liberal predictions.
Evaluation of Multi Commenter-based Prediction
Only the articles in the popular sets were selected since it aggregates the result of multiple commenters and benefits could be observed when many commenters comment on the same article. The method makes prediction by aggregating the results of the 43 commenters of the Popular Set. The accuracy was measured varying the minimum number of commenters required for the prediction, from 1 to 12. The accuracy of the two aggregation policies (Maximum Votes and Maximum Posterior Probability) are measured
The accuracy of prediction increased as we increased the number of commenters. When only conservative or liberal predictions were considered the two aggregation methods achieved an accuracy over 80%. However the article coverage decreased as the number of commenters were increased.
Discussion
This paper considers the commenters sentiment pattern for predicting the political orientation of news articles. The performance seems promising to be used in real world applications like news recommendation systems. The evaluation of the single commenter based prediction was done by using 20 comments per commenters which was not a very big training size. Performance would have been more clear if they would have considered more comments. But effectively the multi commenter based prediction approach is more useful for real world purpose since analyzing comments by multiple persons makes more sense
Study Plan
This paper is simple and readable. You might want to revise