Difference between revisions of "Analyzing and Predicting Youtube Comments Rating: WWW2010"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejd, Jose San Pedro, "How useful are your comments?: analyzing and predicting youtube comments and comment ratings",…')
 
(AC)
Line 10: Line 10:
 
This [[Category::paper]] aims at [[AddressesProblem::analyzing comments]] made on videos hosted on Youtube, and predicting the ratings that users give to these comments. The ratings are basically number of people liking (positive rating) or disliking (negative rating) the comments made by other users. The authors refer to comments that have positive rating as accepted comments and those having negative ratings as unaccepted comments. The motivation is basically finding the sentiment of a comment, with the conjecture being that comment with "positive" sentiment tends to have positive rating, whereas one with "negative" sentiment tends to have negative rating. The authors also perform few other experiments to see the correlation between variance of ratings with polarity (more polar a video, more polar are people's opinions about it) of the videos, and the dependency of ratings and sentiment values of comments on videos of different categories. Please see the [[UsesDataset::Youtube comment analysis dataset]] page for information about dataset.
 
This [[Category::paper]] aims at [[AddressesProblem::analyzing comments]] made on videos hosted on Youtube, and predicting the ratings that users give to these comments. The ratings are basically number of people liking (positive rating) or disliking (negative rating) the comments made by other users. The authors refer to comments that have positive rating as accepted comments and those having negative ratings as unaccepted comments. The motivation is basically finding the sentiment of a comment, with the conjecture being that comment with "positive" sentiment tends to have positive rating, whereas one with "negative" sentiment tends to have negative rating. The authors also perform few other experiments to see the correlation between variance of ratings with polarity (more polar a video, more polar are people's opinions about it) of the videos, and the dependency of ratings and sentiment values of comments on videos of different categories. Please see the [[UsesDataset::Youtube comment analysis dataset]] page for information about dataset.
  
==Reader-Quotation-Topic (ReQuT) model==
+
==Sentiment Analysis of Rated Comments==
Each word is given a Reader, a Quotation and a Topic measure. The motivation is that words written by “authoritative” readers, or the ones found in comments which are quoted in other comments, or those that relate to mostly discussed topics, are important than others. So ReQuT scores are given to each word, and the overall importance of that word is judged by a weighted sum of the ReQuT scores.
+
The authors first analyzed the comments for their sentiments to prove their hypothesis that positively rated comments have positive sentiment and vice-versa. They first categorized the comments into three categories "5Neg" (comments that have a negative rating of 5 or higher), "0Dist" (comments that have not got any rating) and "5Pos" (comments that have a positive rating of 5 or higher). Then the terms in these comments were assigned a sentiment score using SentiWordNet. SentiWordNet has a score triplet in the form of (positivity-score,negativity-score,objectivity-score) for each word present in WordNet. The authors just considered the adjectives present in the comments to be tagged for their sentiment scores. Experiments showed that negatively rated comments had more negative sentiment terms, and positively rated comments had more positive sentiment terms. Authors further did an "analysis of variance" test to prove that the mean of sentiment scores for the three categories varied significantly across any two categories.
  
==The Math behind this==
+
==Predicting Rating for Comments==
====Reader Measure====
+
After the above analysis, the authors did an SVM based classification of the comments. The comment was considered as a vector of sentiment values of the terms present in the comment. The classification was binary with the classes being positive/accepted or negative/unaccepted. For this experiment, authors considered distinct thresholds for the minimum and maximum ratings (above/below +2/-2, +5/-5, +7/-7) for comments to be considered accepted or unaccepted. The authors also chose different amounts of randomly chosen accepted and unaccepted comments (T=1000,10000,50000,200000) for training. At least 1000 comments in each of the classes were kept for testing. Three experiments were conducted. First was classification with accepted comments marked as accepted, and unaccepted comments marked as unaccepted. Second was classification with accepted comments marked as unaccepted, and unaccepted comments marked as accepted; this was done to find the "bad" or erroneous comments. The third experiment was with comments with high rating (positive or negative) and the ones with no rating. The three scenarios are labeled AC_POS, AC_NEG and THRES-0 in the results below.
Given the full set of comments to a blog, the authors construct a directed reader graph <math>G_R :=(V_R, E_R)</math>. Each node <math>r_a \epsilon V_R</math> is a reader, and an edge <math>e_R(r_b, r_a) \epsilon E_R</math> exists if <math>r_b</math> mentions <math>r_a</math> in one of <math>r_b</math>’s comments. The weight on an edge, <math>W_R(r_b, r_a)</math>, is the ratio between the number of times <math>r_b</math> mentions <math>r_a</math> against all times <math>r_b</math> mentions other readers (including <math>r_a</math>). The authors compute reader authority using a ranking algorithm, shown in Equation 1, where <math>|R|</math> denotes the total number of readers of the blog, and d is the damping factor.
+
     
 +
==Results for Rating Prediction==
 +
[[File:Results_youtube.jpg]]
  
<math>
+
==Variance of Comments Rating as Indicator of Polarizing topics==
A(r_a) = d*1/|R| + (1-d) \Sigma W_R(r_b, r_a) * A(r_b)............(1)</math><br>
+
The authors also analyzed the relation between variance of comments rating and the polarity of videos. 1413 tags from 50 videos were selected and average variance of comment ratings was calculated over all videos having a particular tag. The table below shows top-25 and bottom-25 tags according to the average variance. We can see that tags in top-25 videos tend to be related to more polarizing topics, and the ones in bottom-25 videos tend to be related to rather neutral topics.
<math>RM(w_k) = \Sigma tf(w_k, c_i) * A(r_a)...............................(2)</math><br>
 
The reader measure of a word <math>w_k</math>, denoted by <math>RM(w_k)</math>, is given in Equation 2, where <math>tf(w_k, c_i)</math>  is the term frequency of word <math>w_k</math> in comment <math>c_i</math>.
 
  
====Quotation Measure====
+
[[File:Results_youtube_variance.jpg]]
For the set of comments associated with each blog post, the authors construct a directed acyclic quotation graph <math>G_Q := (V_Q,E_Q)</math>. Each node <math>c_i \epsilon V_Q</math> is a comment, and an edge <math>(c_j, c_i) \epsilon E_Q</math> indicates <math>c_j</math> quoted sentences from <math>c_i</math>. The weight on an edge, <math>W_Q(c_j, c_i)</math>, is 1 over the number of comments that c_j ever quoted. The authors derive the quotation degree <math>D(c_i)</math> of a comment <math>c_i</math> using Equation 3. A comment that is not quoted by any other comment receives a quotation degree of <math>1/|C|</math> where <math>|C|</math> is the number of comments associated with the given post.
 
<math>D(c_i) = 1/|C| + \Sigma W_Q(c_j, c_i) * D(c_j)...........(3)</math><br>
 
<math>Q_M(w_k) = \Sigma tf(w_k, c_i) * D(c_i)..................(4)</math><br>
 
The quotation measure of a word <math>w_k</math>, denoted by <math>QM(w_k)</math>, is given in Equation 4. Word <math>w_k</math>
 
appears in comment <math>c_i</math>.
 
  
====Topic Measure====
+
==Category Dependencies of Ratings==
Given the set of comments associated with each blog post, the authors group these comments into topic clusters using a Single-Pass Incremental Clustering algorithm presented in [1]. The authors conjecture that a hotly discussed topic has a large number of comments all close to the topic cluster centroid. Thus they propose Equation 5 to compute the importance of a topic cluster, where <math>|c_i|</math> is the length of comment <math>c_i</math> in number of words, <math>C</math> is the set of comments, and <math>sim(c_i, t_u)</math> is the cosine similarity between comment <math>c_i</math> and the centroid of topic cluster <math>t_u</math>.
+
Authors conducted the classification experiments separately for comments in three different categories: Music, Entertainment, and News & Politics. The results of these experiments is as shown in the figure below. While classification did comparably well for Entertainment and Music categories, it didn't do that well for News & Politics category.
 
 
<math>T(t_u) = 1/ \Sigma |c_j|* \Sigma |c_i|*sim(c_i,t_u)......................(5)</math><br>
 
<math>TM(w_k) = \Sigma tf(w_k, c_i)*T(t_u)......................................(6)</math><br>
 
 
 
Equation 6 defines the topic measure of a word <math>w_k</math>, denoted by <math>TM(w_k)</math>. Comment <math>c_i</math> is clustered into topic cluster <math>t_u</math>.
 
 
 
====Overall Word Representativeness or Importance Score====
 
The representativeness score of a word <math>Rep(w_k)</math> is the combination of reader-, quotation- and topic- measures in
 
ReQuT model. The three measures are first normalized independently based on their corresponding maximum values and then combined linearly to derive <math>Rep(w_k)</math> using Equation 7. In this equation <math>\alpha</math>, <math>\beta</math> and <math>\gamma</math> are the coefficients (0 ≤ <math>\alpha</math>, <math>\beta</math>, <math>\gamma</math> ≤ 1.0 and <math>\alpha</math> + <math>\beta</math> + <math>\gamma</math> = 1.0).
 
 
 
<math>Rep(w_k) = \alpha * RM(w_k) + \beta * QM(w_k) + \gamma * TM(w_k).......................(7)</math>
 
 
 
==Sentence Selection Criteria==
 
Density Based Selection: Based on representativeness score of keywords and the distance between two keywords in a sentence. In equation 8, K is the total number of keywords contained in i^th sentence <math>s_i</math>, <math>Score(w_j)</math> is the representativeness score of keyword <math>w_j</math>, and <math>distance(w_j, w_j+1)</math> is the number of non-keywords (including stopwords) between the two adjacent keywords <math>w_j</math> and <math>w_j+1</math> in <math>s_i</math>.
 
 
 
<math>Score(s_i) = 1/K * (K + 1) * \Sigma Score(w_j) * Score(w_{j+1})/distance(w_j,w_{j+1})^2............................(8)</math>
 
 
 
Summation Based Selection: Based on the number of keywords contained in a sentence. In equation 9, <math>|s_i|</math> is the length of sentence <math>s_i</math> in number of words (including stopwords), and <math>tau</math> (<math>tau</math> > 0) is a parameter to flexibly control the contribution of a word’s representativeness score.
 
 
 
<math>Rep(s_i) = 1/|s_i| * (\Sigma Rep(w_k)^\tau)^{1/\tau}................................(9)</math>
 
 
 
==Results==
 
Two metrics were used: R-Precision and NDCG. NDCG is described in [2].<br>
 
[[File:Results.jpg]]
 
  
 
==References==
 
==References==
 
[1] D. Shen, Q. Yang, J.-T. Sun, and Z. Chen. Thread detection in dynamic text message streams. In Proc. of SIGIR ’06, pages 35–42, Seattle, Washington, 2006.<br>
 
[1] D. Shen, Q. Yang, J.-T. Sun, and Z. Chen. Thread detection in dynamic text message streams. In Proc. of SIGIR ’06, pages 35–42, Seattle, Washington, 2006.<br>
 
[2] K. Jrvelin and J. Keklinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR ’00, pages 41–48, Athens, Greece, 2000.
 
[2] K. Jrvelin and J. Keklinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR ’00, pages 41–48, Athens, Greece, 2000.

Revision as of 22:05, 15 April 2011

Citation

Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejd, Jose San Pedro, "How useful are your comments?: analyzing and predicting youtube comments and comment ratings", Proceedings of the 17th international conference on World Wide Web WWW2010, 2010

Online version

Click here to download

Summary

This paper aims at analyzing comments made on videos hosted on Youtube, and predicting the ratings that users give to these comments. The ratings are basically number of people liking (positive rating) or disliking (negative rating) the comments made by other users. The authors refer to comments that have positive rating as accepted comments and those having negative ratings as unaccepted comments. The motivation is basically finding the sentiment of a comment, with the conjecture being that comment with "positive" sentiment tends to have positive rating, whereas one with "negative" sentiment tends to have negative rating. The authors also perform few other experiments to see the correlation between variance of ratings with polarity (more polar a video, more polar are people's opinions about it) of the videos, and the dependency of ratings and sentiment values of comments on videos of different categories. Please see the Youtube comment analysis dataset page for information about dataset.

Sentiment Analysis of Rated Comments

The authors first analyzed the comments for their sentiments to prove their hypothesis that positively rated comments have positive sentiment and vice-versa. They first categorized the comments into three categories "5Neg" (comments that have a negative rating of 5 or higher), "0Dist" (comments that have not got any rating) and "5Pos" (comments that have a positive rating of 5 or higher). Then the terms in these comments were assigned a sentiment score using SentiWordNet. SentiWordNet has a score triplet in the form of (positivity-score,negativity-score,objectivity-score) for each word present in WordNet. The authors just considered the adjectives present in the comments to be tagged for their sentiment scores. Experiments showed that negatively rated comments had more negative sentiment terms, and positively rated comments had more positive sentiment terms. Authors further did an "analysis of variance" test to prove that the mean of sentiment scores for the three categories varied significantly across any two categories.

Predicting Rating for Comments

After the above analysis, the authors did an SVM based classification of the comments. The comment was considered as a vector of sentiment values of the terms present in the comment. The classification was binary with the classes being positive/accepted or negative/unaccepted. For this experiment, authors considered distinct thresholds for the minimum and maximum ratings (above/below +2/-2, +5/-5, +7/-7) for comments to be considered accepted or unaccepted. The authors also chose different amounts of randomly chosen accepted and unaccepted comments (T=1000,10000,50000,200000) for training. At least 1000 comments in each of the classes were kept for testing. Three experiments were conducted. First was classification with accepted comments marked as accepted, and unaccepted comments marked as unaccepted. Second was classification with accepted comments marked as unaccepted, and unaccepted comments marked as accepted; this was done to find the "bad" or erroneous comments. The third experiment was with comments with high rating (positive or negative) and the ones with no rating. The three scenarios are labeled AC_POS, AC_NEG and THRES-0 in the results below.

Results for Rating Prediction

Results youtube.jpg

Variance of Comments Rating as Indicator of Polarizing topics

The authors also analyzed the relation between variance of comments rating and the polarity of videos. 1413 tags from 50 videos were selected and average variance of comment ratings was calculated over all videos having a particular tag. The table below shows top-25 and bottom-25 tags according to the average variance. We can see that tags in top-25 videos tend to be related to more polarizing topics, and the ones in bottom-25 videos tend to be related to rather neutral topics.

Results youtube variance.jpg

Category Dependencies of Ratings

Authors conducted the classification experiments separately for comments in three different categories: Music, Entertainment, and News & Politics. The results of these experiments is as shown in the figure below. While classification did comparably well for Entertainment and Music categories, it didn't do that well for News & Politics category.

References

[1] D. Shen, Q. Yang, J.-T. Sun, and Z. Chen. Thread detection in dynamic text message streams. In Proc. of SIGIR ’06, pages 35–42, Seattle, Washington, 2006.
[2] K. Jrvelin and J. Keklinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR ’00, pages 41–48, Athens, Greece, 2000.