Difference between revisions of "Adaptive Real-time Filtering in Twitter"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Team members == * Yubin Kim == Project Summary == This project will explore how to create a topic-based filter for tweets arriving in real-time, assuming that…')
 
Line 1: Line 1:
 
== Team members ==
 
== Team members ==
 
* [[User:yubink|Yubin Kim]]
 
* [[User:yubink|Yubin Kim]]
 +
* Looking for teammates! ;)
  
 
== Project Summary ==
 
== Project Summary ==

Revision as of 22:45, 7 October 2012

Team members

Project Summary

This project will explore how to create a topic-based filter for tweets arriving in real-time, assuming that user judgements (of relevant vs. non-relevant) for tweets shown by the system is available. This project will follow the framework of the Real-time Filtering task of the Microblog Track in the Text REtrieval Conference (TREC), a well-known competitive conference hosted by NIST each year [1]. The goal of the project will be to produce a competitive system for entry into the 2013 run of the track.

Dataset

The project will use the Microblog Track dataset, queries and relevance judgements. The tweet dataset contains 14 million tweets and 50 query topics with relevance judgements. Also available from a previous project is a web crawl of 1 million HTML documents that were linked from tweets.

Task

Given a topic query, a query time, and a corpus of tweets prior to the topic query time, the project aims to filter future tweets such that only tweets relevant to the topic are returned. Any future tweets shown to the "user" will receive feedback that can be incorporated back into the system.

Baseline

The baseline of the project will the ranked list of tweets returned from a search engine queried with the topic. (Of course, the ranked list will be filtered and re-ordered such that they will be temporally ordered, and only tweets from the future of the topic query time will be shown.)

Challenges

  • The current plan for the system includes the modification of the Indri search engine, which will be heavy in implementation (C++)