Difference between revisions of "Adaptive Real-time Filtering in Twitter"

From Cohen Courses
Jump to navigationJump to search
 
Line 7: Line 7:
 
== Team members ==
 
== Team members ==
 
* [[User:yubink|Yubin Kim]]
 
* [[User:yubink|Yubin Kim]]
* [[User:Tahoang|Tuan Ahn]]
+
* [[User:Tahoang|Tuan Anh]]
  
 
== Project Summary ==
 
== Project Summary ==

Latest revision as of 14:53, 15 October 2012

Comments

Looks like a nice well-defined project. Can you say a little bit about how this is different from your research - I know you're working on similar stuff with Jamie. --Wcohen 14:42, 10 October 2012 (UTC)

My research with Jamie focused on ad-hoc search when I was working with Twitter. The filtering task is a new problem for me, although I admit that I'm hoping to reuse some of the Tweet processing infrastructure I have set up for my ad-hoc search project. I also recently switched my main research project to federated search, so I was hoping to keep my fingers in the old stuff. --Yubink 17:24, 11 October 2012 (UTC)

Team members

Project Summary

This project will explore how to create a topic-based filter for tweets arriving in real-time, assuming that user judgements (of relevant vs. non-relevant) for tweets shown by the system is available. This project will follow the framework of the Real-time Filtering task of the Microblog Track in the Text REtrieval Conference (TREC), a well-known competitive conference hosted by NIST each year [1]. The goal of the project will be to produce a competitive system for entry into the 2013 run of the track.

Dataset

The project will use the Microblog Track dataset, queries and relevance judgements. The tweet dataset contains 14 million tweets and 50 query topics with relevance judgements. Also available from a previous project is a web crawl of 1 million HTML documents that were linked from tweets.

Task

Given a topic query, a query time, and a corpus of tweets prior to the topic query time, the project aims to filter future tweets such that only tweets relevant to the topic are returned. Any future tweets shown to the "user" will receive feedback that can be incorporated back into the system.

Baseline

The baseline of the project will the ranked list of tweets returned from a search engine queried with the topic. (Of course, the ranked list will be filtered and re-ordered such that they will be temporally ordered, and only tweets from the future of the topic query time will be shown.)

Challenges

  • The current plan for the system includes the modification of the Indri search engine, which will be heavy in implementation (C++)