Projects for Machine Learning with Large Datasets 10-605 in Spring 2012

From Cohen Courses
Jump to navigationJump to search

You are required to do a one-month short project. The project should be relevant to the course - e.g., to compare the scalability of variant learning algorithms on datasets.

Here are some possible project ideas

Nearest Neighbor based Greedy Coordinate Descent

This is a work done by I. Dhillon, P. Ravikumar, and A. Tewari in their NIPS 2011 paper.

This paper presents an interesting approach to coordinate descent learning of high dimensional linear models. For linear models, the gradient along a coordinate is the inner product of the corresponding feature’s data vector and the gradient vector. Therefore, finding the coordinate with the largest gradient can be proximate by finding the feature vector which is closest to the gradient vector, which can be approximately solved by indexing techniques such as Locality Sensitive Hashing (LSH).

This paper raises a new line of research where indexing techniques such as LSH become a critical component for learning with high dimensional problems. This technique can potentially be applied to problems such as topic modeling (e.g. LDA), or graphical model structure learning.

You can reproduce their experiment result, or apply their technique to problem of your choice.

Word context and word meaning

The advent of the WWW has given us a huge amount of text data. That data contains many words used in different contexts with different meanings. Can you use the patterns of word context to infer something about word meaning?

For example, consider all of the word co-occurrences with the noun "apple". Now consider the subset of those word co-occurrences that appear when the adjective rotten comes before apple. What does that change in co-occurrence data tell you about the adjective "rotten"? Does it imply that a rotten apple is no longer something a person would want to eat? In addition, "rotten apple" has a metaphorical meaning (the free dictionary defines it as a person with a corrupting influence). Can you detect the multiple meanings from the co-occurrence data?

Classifying into a large hierarchy

Can you use the structure of a hierarchy of labels to improve the classification of documents (or anything else) into that hierarchy? There are many approaches to this problem. One is discussed in this paper: You could propose a new one, or extend and existing one.

For this project you could use the Reuters news wire data and its hierarchical labels, or propose another large hierarchical data set.