Difference between revisions of "Xxiong project abstract"

From Cohen Courses
Jump to navigationJump to search
Line 23: Line 23:
 
Besides using neighbors' predictions, parent or/and children predictions may also be "stacked" into one's feature vector.
 
Besides using neighbors' predictions, parent or/and children predictions may also be "stacked" into one's feature vector.
 
Different from LDA, this model can only be used in a supervised mode.
 
Different from LDA, this model can only be used in a supervised mode.
 +
 +
3. In Searn-based clustering, the partial output of the policy is a set of clusters. In each step
 +
we need to decide whether to add the current sample to a new cluster or to one of the existing clusters.
  
 
== Dataset ==
 
== Dataset ==

Revision as of 13:04, 29 September 2010

Team Members

Xuehan Xiong [xxiong@andrew.cmu.edu]

Goal

1. A revisit of boosting. I will evaluate the proposed method via NER task.

2. Extend a stacked hierarchical model recently developed for computer vision tasks and apply it in the IE domain.

3. A new clustering algorithm based on Searn. The ground truth of clustering is needed for learning the policy.

I will choose (1) and one of (2) and (3) as my final project.

Motivation

1. In the traditional boosting, within each iteration the mis-classified samples are weighted more in the next round. However, these errors are made from training data. In my algorithm, I will give more weight to the data that are mis-labeled from cross-validation process, as in stacking.

2. The intuition of stacked hierarchical model is that predictions from one level of the hierarchy should help to predict the entities in the level above or below. Besides using neighbors' predictions, parent or/and children predictions may also be "stacked" into one's feature vector. Different from LDA, this model can only be used in a supervised mode.

3. In Searn-based clustering, the partial output of the policy is a set of clusters. In each step we need to decide whether to add the current sample to a new cluster or to one of the existing clusters.

Dataset

Superpowers

Experience with CRF and stacking in the domain of computer vision.

What question you want to answer

1. I want to know whether the proposed algorithm will outperform the traditional Ada-boost.

2. I want to know whether the stacked hierarchical model will be more effective than hierarchical Bayesian models, such as LDA, in the applications of IE and whether it will improve the results upon the original stacking algorithm without hierarchy.