Difference between revisions of "Bagging"
From Cohen Courses
Jump to navigationJump to searchPastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Summary == | == Summary == | ||
− | Bagging is. | + | Bagging (a.k.a '''b'''ootstrap '''agg'''regat'''ing''') is an ensemble machine learning [[category::method]] introduced by Leo Brieman for classification and regression, which generates multiple versions of a predictor, by making bootstrap replications of the learning set and using them as the new learning set, and uses them to produce an aggregated predictor, which does voting over the different versions for classification and averages outcomes when predicting numerical values. Brieman showed that bagging can best improve accuracy when the predictors are good but unstable (when perturbing the learning set results in significant changes in the predictors). |
− | == | + | == Procedure == |
+ | '''Input/Definitions''': | ||
+ | * ''D'' - Original training set | ||
+ | * ''D_i'' - One of the bootstrap sample training sets | ||
+ | * ''n'' - Size of ''D'' | ||
+ | * ''m'' - Number of predictors to construct | ||
+ | * ''M'' - Set of trained models | ||
+ | * ''M_i'' - One of the trained models | ||
− | ... | + | '''Training''': |
+ | * Generate ''m'' new training sets, ''D_i'', of size ''n_prime'' < ''n''. | ||
+ | ** Do so by sampling from ''D'' uniformly and with replacement (a.k.a. a bootstrap sample) | ||
+ | * Train ''m'' models, ''M_i'', using bootstrap sample ''D_i'' | ||
− | + | '''Output Prediction''': | |
+ | * For Regression: Average outcome of the predictors in ''M'' | ||
+ | * For Classification: Vote of the predictors in ''M'' | ||
− | + | == References / Links == | |
+ | * Leo Brieman. '''Bagging Predictors'''. Machine Learning, 24, 123–140 (1996). - [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.7654&rep=rep1&type=pdf] | ||
+ | * Wikipedia article on Bagging - [http://en.wikipedia.org/wiki/Bootstrap_aggregating] | ||
− | == | + | == Relevant Papers == |
− | |||
− | + | {{#ask: [[UsesMethod::Bagging]] | |
− | + | | ?AddressesProblem | |
+ | | ?UsesDataset | ||
+ | }} |
Latest revision as of 17:31, 30 November 2010
Summary
Bagging (a.k.a bootstrap aggregating) is an ensemble machine learning method introduced by Leo Brieman for classification and regression, which generates multiple versions of a predictor, by making bootstrap replications of the learning set and using them as the new learning set, and uses them to produce an aggregated predictor, which does voting over the different versions for classification and averages outcomes when predicting numerical values. Brieman showed that bagging can best improve accuracy when the predictors are good but unstable (when perturbing the learning set results in significant changes in the predictors).
Procedure
Input/Definitions:
- D - Original training set
- D_i - One of the bootstrap sample training sets
- n - Size of D
- m - Number of predictors to construct
- M - Set of trained models
- M_i - One of the trained models
Training:
- Generate m new training sets, D_i, of size n_prime < n.
- Do so by sampling from D uniformly and with replacement (a.k.a. a bootstrap sample)
- Train m models, M_i, using bootstrap sample D_i
Output Prediction:
- For Regression: Average outcome of the predictors in M
- For Classification: Vote of the predictors in M
References / Links
- Leo Brieman. Bagging Predictors. Machine Learning, 24, 123–140 (1996). - [1]
- Wikipedia article on Bagging - [2]