Projects for Machine Learning with Large Datasets 10-605 in Spring 2012

From Cohen Courses
Revision as of 13:14, 13 January 2012 by Nlao (talk | contribs)
Jump to navigationJump to search

You are required to do a one-month short project. The project should be relevant to the course - e.g., to compare the scalability of variant learning algorithms on datasets.

Here are some possible project ideas

Nearest Neighbor based Greedy Coordinate Descent

This is a work done by I. Dhillon, P. Ravikumar, and A. Tewari in NIPS 2011.

This paper presents an interesting approach to coordinate descent learning of high dimensional linear models. For linear models, the gradient along a coordinate is the inner product of the corresponding feature’s data vector and the gradient vector. Therefore, finding the coordinate with the largest gradient can be proximate by finding the feature vector which is closest to the gradient vector, which can be approximately solved by indexing techniques such as Locality Sensitive Hashing (LSH).

This paper raises a new line of research where indexing techniques such as LSH become a critical component for learning with high dimensional problems. This technique can potentially be applied to problems such as topic modeling (e.g. LDA), or graphical model structure learning.