Difference between revisions of "Castillo 2011"

Revision as of 10:45, 30 September 2012

Castillo http://www.ra.ethz.ch/cdstore/www2011/proceedings/p675.pdf

Citation

@inproceedings{conf/www/CastilloMP11,

 author    = {Carlos Castillo and
              Marcelo Mendoza and
              Barbara Poblete},
 title     = {Information credibility on twitter},
 booktitle = {WWW},
 year      = {2011},
 pages     = {675-684},
 ee        = {http://doi.acm.org/10.1145/1963405.1963500},

}

Abstract from the paper

In the past few years there has been increased interest in using data-mining techniques to extract interesting patterns from time series data generated by sensors monitoring temporally varying phenomenon. Most work has assumed that raw data is somehow processed to generate a sequence of events, which is then mined for interesting episodes. In some cases the rule for determining when a sensor reading should generate an event is well known. However, if the phenomenon is ill-understood, stating such a rule is difficult. Detection of events in such an environment is the focus of this paper. Consider a dynamic phenomenon whose behavior changes enough over time to be considered a qualitatively significant change. The problem we investigate is of identifying the time points at which the behavior change occurs. In the statistics literature this has been called the change-point detection problem. The standard approach has been to (a) apriori determine the number of change-points that are to be discovered, and (b) decide the function that will be used for curve fitting in the interval between successive change-points. In this paper we generalize along both these dimensions. We propose an iterative algorithm that fits a model to a time segment, and uses a likelihood criterion to determine if the segment should be partitioned further, i.e. if it contains a new change point. In this paper we present algorithms for both the batch and incremental versions of the problem, and evaluate their behavior with synthetic and real data. Finally, we present initial results comparing the change-points detected by the batch algorithm with those detected by people using visual inspection

Online version

pdf link to the paper

Summary

Task Definition

Develop a general approach to change-point detection that generalize across wide range of application

Method

Batch Algorithm

Algorithm overview

The algorithm takes the set of approximating basis functions MSet and the time series T

new-change-point = find-candidate(T, MSet)
Change-Points = $\phi$
Candidates = $\phi$
Tl, Tz = get-new-time-ranges(T, Change-Points, new-change-point)
while(stopping criteria is not met) do begin
1. cl = find-candidate(T1, MSet)
2. c2 = find-andidate(T2, MSet)
3. Candidates = Candidates $\cup c_{1}$
4. Candidates = Candidates $\cup c_{2}$
5. new-change-point = c $\in$ Candidates |Q(Change-Points,c) = min
6. Candidates = Candidates \ new-change-point
7. Tl,T2 = get-new-time-ranges(T, Change-Points, new-change-point)
8. Change-Points = Change-Points $\cup$ new-change-points
end

Automatic Assessing Credibility

Standard machine learning techniques, the best they report is using J48 decision tree.

Results:

Results for the credibility classification.

Class TP_Rate FP_Rate Prec. Recall F1

A (“true”) 0.825 0.108 0.874 0.825 0.849

B (“false”) 0.892 0.175 0.849 0.892 0.87

W. Avg. 0.860 0.143 0.861 0.860 0.86

Feature Level Analysis

Top feature that contribute more on deciding credibility:

Tweets having an URL is the root of the tree.
Sentiment-based feature like fraction of negative sentiment
Low credibility news are mostly propagated by users who have not written many message in the past

Interesting Aspect

I like the coding scheme of this paper. It is reasonable and comprehensive. Some of the conclusion that drew from the paper is interesting to look at. For example

Among several other features, newsworthy topics tend to include URLs and to have deep propagation trees
Among several other features, credible news are propagated through authors that have previously written a large number of messages, originate

at a single or a few users in the network, and have many re-posts.

Related Papers

T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors.

In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 851–860, New York, NY, USA, April 2010. ACM

J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press.

@@ Line 62: / Line 62: @@
 and the time series T
-. new-change-point = find-candidate(T, MSet)
+# new-change-point = find-candidate(T, MSet)
+# Change-Points = <math>\phi</math>
-. Change-Points = <math>\phi</math>
+# Candidates = <math>\phi</math>
+# Tl, Tz = get-new-time-ranges(T, Change-Points, new-change-point)
-. Candidates = <math>\phi</math>
+# '''while'''(stopping criteria is not met) do begin
+## cl = find-candidate(T1, MSet)
-.     Tl, Tz = get-new-time-ranges(T, Change-Points, new-change-point)
+## c2 = find-andidate(T2, MSet)
+## Candidates = Candidates <math>\cup c_1</math>
-. while(stopping criteria is not met) do begin
+##Candidates = Candidates <math>\cup c_2</math>
+##new-change-point = c <math>\in</math> Candidates |Q(Change-Points,c) = min
-. cl = find-candidate(T1, MSet)
+##Candidates = Candidates \ new-change-point
+##Tl,T2 = get-new-time-ranges(T, Change-Points, new-change-point)
-. c2 = find-andidate(T2, MSet)
+##Change-Points = Change-Points <math>\cup</math> new-change-points
+#'''end'''
-. Candidates = Candidates <math>\cup</math> cl
-. Candidates = Candidates <math>\cup</math> c2
-.new-change-point = c <math>\in</math> Candidates |Q(Change-Points,c) = min
-.Candidates = Candidates \ new-change-point
-.Tl,T2 = get-new-time-ranges(T, Change-Points, new-change-point)
-.Change-Points = Change-Points <math>\cup</math> new-change-points
-.end
 ===  Automatic Assessing Credibility ===

Difference between revisions of "Castillo 2011"

Revision as of 10:45, 30 September 2012

Contents

Citation

Abstract from the paper

Online version

Summary

Task Definition

Method

Batch Algorithm

Automatic Assessing Credibility

Feature Level Analysis

Interesting Aspect

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools