Difference between revisions of "Gradient Boosted Decision Tree"

Latest revision as of 13:01, 29 March 2011

GBDT is an additive regression algorithm consisting of an ensemble of trees, fitted to current residuals, gradients of the loss function, in a forward step-wise manner. It iteratively fits an additive model as

$f_{t}(x)=T_{t}(x;\Theta )+\lambda {\overset {T}{\underset {t=1}{\sum }}}\beta _{t}T_{t}(x;\Theta _{t})$

such that a certain loss function $L(y_{i},f_{T}(x+i))$ is minimized, where $T_{t}(x;\Theta _{t})$ is a tree at iteration $t$ , weighted by parameter $\beta _{t}$ , with a finite number of parameters, $\Theta _{t}$ and $\lambda$ is the learning rate. At iteration $t$ , tree $T_{t}(x;\sigma )$ is induced to fit the negative gradient by least squares. That is

${\hat {\Theta }}:=\operatorname {argmin} _{\beta }{\overset {N}{\underset {i}{\sum }}}(-G_{it}-\beta _{t}T_{t}(x_{i});\Theta )^{2}$

where $G_{it}$ is the gradient over current prediction function

$G_{it}=[{\frac {\delta L(y_{i},f(x_{i}))}{\delta f(x_{i})}}]_{f=f_{t}-1}$

The optimal weights of trees $\beta _{t}$ are determined by

$\beta _{t}=\operatorname {argmin} _{\beta }{\overset {N}{\underset {i}{\sum }}}L(y_{i},f_{t-1}(x_{i})+\beta T(x_{i},\theta ))$

Source: Dong et al WWW 2010

@@ Line 3: / Line 3: @@
 <math>f_{t}(x)=T_{t}(x;\Theta)+\lambda\overset{T}{\underset{t=1}{\sum}}\beta_{t}T_{t}(x;\Theta_{t})</math>
-such that a certain loss function <math>L(y_{i}, f_{T} (x+i))</math> is minimized, where <math>T_{t}(x;\Theta_{t})</math> is a tree at iteration <math>t</math>, weighted by parameter <math>\sigma_{t}</math>, with a finite number of parameters, <math>\Theta_{t}</math> and <math>\lambda</math> is the learning rate. At iteration <math>t</math>, tree <math>T_{t}(x;\sigma)</math> is induced to fit the negative gradient by least squares. That is
+such that a certain loss function <math>L(y_{i}, f_{T} (x+i))</math> is minimized, where <math>T_{t}(x;\Theta_{t})</math> is a tree at iteration <math>t</math>, weighted by parameter <math>\beta_{t}</math>, with a finite number of parameters, <math>\Theta_{t}</math> and <math>\lambda</math> is the learning rate. At iteration <math>t</math>, tree <math>T_{t}(x;\sigma)</math> is induced to fit the negative gradient by least squares. That is
 <math>\hat{\Theta}:=\operatorname{argmin}_{\beta}\overset{N}{\underset{i}{\sum}}(-G_{it}-\beta_{t}T_{t}(x_{i});\Theta)^{2}</math>
@@ Line 13: / Line 13: @@
 The optimal weights of trees <math>\beta_{t}</math> are determined by
-<math><\beta_{t}=\operatorname{argmin}_{\beta}\overset{N}{\underset{i}{\sum}}L(y_{i},f_{t-1}(x_{i})+\beta T(x_{i},\theta))/math>
+<math>\beta_{t}=\operatorname{argmin}_{\beta}\overset{N}{\underset{i}{\sum}}L(y_{i},f_{t-1}(x_{i})+\beta T(x_{i},\theta))</math>
-(Source: [Dong et al WWW 2010])
+''Source: [[Dong et al WWW 2010]]''

Difference between revisions of "Gradient Boosted Decision Tree"

Latest revision as of 13:01, 29 March 2011

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools