Difference between revisions of "Castillo 2011"

From Cohen Courses
Jump to navigationJump to search
Line 91: Line 91:
 
algorithm performed compared to a person doing
 
algorithm performed compared to a person doing
 
the same task through visual inspection.
 
the same task through visual inspection.
* The results seems a little week. It has no ground truth. Statistically speaking it do performs better than 4 human subject.
+
* The results seems a little week. It has no ground truth. Statistically speaking it does performs better than 4 human subject.
  
 
==== Incremental Algorithm ====
 
==== Incremental Algorithm ====

Revision as of 21:55, 1 October 2012

Castillo http://www.ra.ethz.ch/cdstore/www2011/proceedings/p675.pdf


Citation

@inproceedings{conf/www/CastilloMP11,

 author    = {Carlos Castillo and
              Marcelo Mendoza and
              Barbara Poblete},
 title     = {Information credibility on twitter},
 booktitle = {WWW},
 year      = {2011},
 pages     = {675-684},
 ee        = {http://doi.acm.org/10.1145/1963405.1963500},

}


Abstract from the paper

In the past few years there has been increased interest in using data-mining techniques to extract interesting patterns from time series data generated by sensors monitoring temporally varying phenomenon. Most work has assumed that raw data is somehow processed to generate a sequence of events, which is then mined for interesting episodes. In some cases the rule for determining when a sensor reading should generate an event is well known. However, if the phenomenon is ill-understood, stating such a rule is difficult. Detection of events in such an environment is the focus of this paper. Consider a dynamic phenomenon whose behavior changes enough over time to be considered a qualitatively significant change. The problem we investigate is of identifying the time points at which the behavior change occurs. In the statistics literature this has been called the change-point detection problem. The standard approach has been to (a) apriori determine the number of change-points that are to be discovered, and (b) decide the function that will be used for curve fitting in the interval between successive change-points. In this paper we generalize along both these dimensions. We propose an iterative algorithm that fits a model to a time segment, and uses a likelihood criterion to determine if the segment should be partitioned further, i.e. if it contains a new change point. In this paper we present algorithms for both the batch and incremental versions of the problem, and evaluate their behavior with synthetic and real data. Finally, we present initial results comparing the change-points detected by the batch algorithm with those detected by people using visual inspection

Online version

pdf link to the paper

Summary

Task Definition

  • Develop a general approach to change-point detection that generalize across wide range of application

Method

Batch Algorithm

  • Algorithm overview

The algorithm takes the set of approximating basis functions MSet and the time series T

  1. new-change-point = find-candidate(T, MSet)
  2. Change-Points =
  3. Candidates =
  4. Tl, Tz = get-new-time-ranges(T, Change-Points, new-change-point)
  5. while(stopping criteria is not met) do begin
    1. cl = find-candidate(T1, MSet)
    2. c2 = find-andidate(T2, MSet)
    3. Candidates = Candidates
    4. Candidates = Candidates
    5. new-change-point = c Candidates |Q(Change-Points,c) = min
    6. Candidates = Candidates \ new-change-point
    7. Tl,T2 = get-new-time-ranges(T, Change-Points, new-change-point)
    8. Change-Points = Change-Points new-change-points
  6. end
  • Stopping Criteria

If in iterations k and k+1 the respective likelihood criteria, the algorithm should stop if the difference of the likelihood proportioned to the last step likelihood is below a small constant.

  • Experiment with Traffic Data

The data used in our experiments was taken from highway traffic sensors, called loop detectors, in the Minneapolis-St. Paul metro area. A loop detector is a sensor, embedded in the road, with an electro-magnetic field around it, which is broken when a vehicle goes over it. Each such breaking, and the subsequent reestablishment, of the field is recorded as a count. Traffic volume is defined as the vehicle count per unit time. We need to find the change point detection algorithm performed compared to a person doing the same task through visual inspection.

  • The results seems a little week. It has no ground truth. Statistically speaking it does performs better than 4 human subject.

Incremental Algorithm

Because the next data point collected by the sensor reflects significant change in phenomenon, then its likelihood criteria of being a change point is going to b smaller than likelihood criteria that it is not. However, if the difference in likelihood is small, it can just be noise. So the incremental algorithm change the stopping criteria to

  • The no_change period and change period likelihood difference is below a small constant times of the no_change period likelihood.
Performance Evaluation

Not as good as the batch model, because it is local optimum since the future coming data is not observed

Interesting points

It is a non-Bayesian model, hence prior model doesn't require. It is very easy to implement and it considers the time span which is very important, but since it is a quite old paper, the experiments are obsolete.

Related work