Difference between revisions of "Boosting"

Revision as of 11:31, 30 September 2011

Summary

The underlying idea of this method is to combine simple "rules" or weak learners, each performing only slightly better than random, to form an ensemble such that the performance of the single ensemble member is improved, i.e "boosted". Let $h_{1},h_{2},\ldots ,h_{T}$ be a set of hypotheses, and consider the composite ensemble hypothesis

$f(x)=\sum _{t=1}^{T}\alpha _{t}h_{t}(x)$

Here $\alpha _{t}$ denotes the coefficient with which the ensemble member $h_{t}$ is combined; both $\alpha _{t}$ and the learner or the hypotheses $h_{t}$ are to be learned within the Boosting procedure. The coefficients $\alpha _{t}'s$ (or weights) are set in iterations. The intuitive idea is that examples that are misclassified get higher weights in the next iteration, for instance the examples near the decision boundary are usually harder to classify and therefor get high weights after a few iterations.

AdaBoost

Tha AdaBoost algorithm by Freund and Schapire is one of the most successful Boosting algorithm. We focus on the problem of binary classfication and present the AdaBoost algorithm for it.

A non negative weighting $\mathbf {d} ^{(t)}=(d_{1}^{(t)},\ldots ,d_{N}^{(t)})$ is assigned to the data at step $t$ , and a weak learner $h_{t}$ is constructed based on $\mathbf {d} ^{(t)}$ . This weighting is updated at each iteration $t$ according to the weighted error incurred by the weak learner in the last iteration. At each step $t$ , the weak learner is required to produce a small weighted empirical error defined by

$\varepsilon _{t}(h_{t},\mathbf {d} ^{(t)})=\sum _{n=1}^{N}d_{n}^{(t)}\mathbf {I} (y_{n}\neq h_{t}({\boldsymbol {x}}_{n}))$

After selecting the hypothesis $h_{t}$ , its weight $\alpha _{t}$ is computed such that it minimizes a certain loss function. In AdaBoost one minimizes

$\mathrm {G} ^{AB}(\alpha )=\sum _{n=1}^{N}\mathrm {exp} \{-y_{n}(\alpha h_{t}({\boldsymbol {x}}_{n})+f_{t-1}({\boldsymbol {x}}_{n}))\}$ ,

where $f_{t-1}$ is the combined hypotheses of the previous iteration given by

$f_{t-1}({\boldsymbol {x}}_{n})=\sum _{r=1}^{t-1}\alpha _{r}h_{r}({\boldsymbol {x}}_{n})$

For AdaBoost it has been shown that $\alpha _{t}$ can be computed analytically leading to the expression in step (3c) of the algorithm. Based on the new combined hypotheses, the weighting ${\boldsymbol {d}}$ of the sample is updated as in step (3d) of algorithm. The initial weighting ${\boldsymbol {d}}^{(1)}$ is chosen uniformly: $d_{n}^{(n)}=1/N$

@@ Line 1: / Line 1: @@
 == Summary ==
-[[category::method]]
+The underlying idea of this [[category::method]] is to combine simple "rules" or weak learners, each performing only slightly better than random, to form an ensemble such that the performance of the single ensemble member is improved, i.e "boosted". Let <math>h_1,h_2, \ldots ,h_T</math> be a set of hypotheses, and consider the composite ensemble hypothesis
+<math>f(x)= \sum_{t=1}^{T} \alpha_{t} h_t(x)</math>
+Here <math>\alpha_{t} </math> denotes the coefficient with which the ensemble member <math>h_t</math> is combined; both <math>\alpha_{t} </math> and the learner or the hypotheses <math>h_t</math> are to be learned within the Boosting procedure. The coefficients <math>\alpha_{t}'s</math> (or weights) are set in iterations. The intuitive idea is that examples that are misclassified get higher weights in the next iteration, for instance the examples near the decision boundary are usually harder to classify and therefor get high weights after a few iterations.
+== AdaBoost ==
+Tha AdaBoost algorithm by [[Paper:Yoav Freund, Robert E. Schapire, 1995 | Freund and Schapire]] is one of the most successful Boosting algorithm. We focus on the problem of ''binary classfication'' and present the AdaBoost algorithm for it.
+[[File:AdaBoost.jpg]]
+A non negative weighting <math>\mathbf{d}^{(t)} = (d_{1}^{(t)}, \ldots , d_{N}^{(t)})</math> is assigned to the data at step <math>t</math>, and a weak learner <math>h_t</math> is constructed based on <math>\mathbf{d}^{(t)}</math>. This weighting is updated at each iteration <math>t</math> according to the weighted error incurred by the weak learner in the last iteration. At each step <math>t</math>, the weak learner is required to produce a small weighted empirical error defined by
+<math>\varepsilon_t(h_t, \mathbf{d}^{(t)}) = \sum_{n=1}^{N} d_n^{(t)} \mathbf{I}(y_n \neq h_t(\boldsymbol{x}_n))</math>
+After selecting the hypothesis <math>h_t</math>, its weight <math>\alpha_t</math> is computed such that it minimizes a certain loss function. In AdaBoost one minimizes
+<math>\mathrm{G}^{AB}(\alpha) = \sum_{n=1}^{N} \mathrm{exp} \{-y_n(\alpha h_t(\boldsymbol{x}_n) + f_{t-1}(\boldsymbol{x}_n))\}</math>,
+where <math>f_{t-1}</math> is the combined hypotheses of the previous iteration given by
+<math>f_{t-1}(\boldsymbol{x}_n) = \sum_{r=1}^{t-1} \alpha_r h_r(\boldsymbol{x}_n)</math>
+For AdaBoost it has been shown that <math>\alpha_t</math> can be computed analytically leading to the expression in step (3c) of the algorithm. Based on the new combined hypotheses, the weighting <math>\boldsymbol{d}</math> of the sample is updated as in step (3d) of algorithm. The initial weighting <math>\boldsymbol{d}^{(1)}</math> is chosen uniformly: <math>d_n^{(n)} = 1/N</math>

Difference between revisions of "Boosting"

Revision as of 11:31, 30 September 2011

Summary

AdaBoost

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools