Projects for Machine Learning with Large Datasets 10-605 in Spring 2012

From Cohen Courses
Jump to navigationJump to search

You are required to do a one-month short project. The project should be relevant to the course - e.g., to compare the scalability of variant learning algorithms on datasets.

Here are some possible project ideas

Nearest Neighbor based Greedy Coordinate Descent

This is a work done by I. Dhillon, P. Ravikumar, and A. Tewari in their NIPS 2011 paper.

This paper presents an interesting approach to coordinate descent learning of high dimensional linear models. For linear models, the gradient along a coordinate is the inner product of the corresponding feature’s data vector and the gradient vector. Therefore, finding the coordinate with the largest gradient can be proximate by finding the feature vector which is closest to the gradient vector, which can be approximately solved by indexing techniques such as Locality Sensitive Hashing (LSH).

This paper raises a new line of research where indexing techniques such as LSH become a critical component for learning with high dimensional problems. This technique can potentially be applied to problems such as topic modeling (e.g. LDA), or graphical model structure learning.

You can reproduce their experiment result, or apply their technique to problem of your choice.