Difference between revisions of "Bootstrapping"
(Created page with ''''Bootstrapping''' is a term used to define a general class of algorithms which benefit from a small set of labeled examples and a large amount of unlabeled data. == Motivation…') |
|||
Line 7: | Line 7: | ||
For novel research problems or real-world applications, however, annotation is necessary. Bootstrapping is an approach which labels features, rather than instances. An example from the domain of topic recognition is the word [[puck]]. In a classifier identifying whether a document is about baseball or hockey, the word puck is going to be a very high-precision indicator. | For novel research problems or real-world applications, however, annotation is necessary. Bootstrapping is an approach which labels features, rather than instances. An example from the domain of topic recognition is the word [[puck]]. In a classifier identifying whether a document is about baseball or hockey, the word puck is going to be a very high-precision indicator. | ||
− | + | == Relevant Papers == | |
+ | |||
+ | {{#ask: [[UsesMethod::Bootstrapping]] | ||
+ | | ?AddressesProblem | ||
+ | | ?UsesDataset | ||
+ | }} |
Revision as of 21:24, 29 November 2011
Bootstrapping is a term used to define a general class of algorithms which benefit from a small set of labeled examples and a large amount of unlabeled data.
Motivation
By far the largest cost in applied machine learning, in terms of human labor, is annotation of data. In order to train supervised classifiers, a large number of fully labeled examples are necessary. This means that most research is limited to popular public corpora, such as the Penn Treebank, or data which can be easily labeled from metadata, such as crawled movie reviews in sentiment analysis.
For novel research problems or real-world applications, however, annotation is necessary. Bootstrapping is an approach which labels features, rather than instances. An example from the domain of topic recognition is the word puck. In a classifier identifying whether a document is about baseball or hockey, the word puck is going to be a very high-precision indicator.