Difference between revisions of "Gradient Boosted Decision Tree"
From Cohen Courses
Jump to navigationJump to search (Created page with 'GBDT is an additive regression algorithm consisting of an ensemble of trees, fitted to current residuals, gradients of the loss function, in a forward step-wise manner. It iterat…') |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
<math>f_{t}(x)=T_{t}(x;\Theta)+\lambda\overset{T}{\underset{t=1}{\sum}}\beta_{t}T_{t}(x;\Theta_{t})</math> | <math>f_{t}(x)=T_{t}(x;\Theta)+\lambda\overset{T}{\underset{t=1}{\sum}}\beta_{t}T_{t}(x;\Theta_{t})</math> | ||
− | such that a certain loss function <math>L(y_{i}, f_{T} (x+i))</math> is minimized, where <math>T_{t}(x;\Theta_{t})</math> is a tree at iteration <math>t</math>, weighted by parameter <math>\ | + | such that a certain loss function <math>L(y_{i}, f_{T} (x+i))</math> is minimized, where <math>T_{t}(x;\Theta_{t})</math> is a tree at iteration <math>t</math>, weighted by parameter <math>\beta_{t}</math>, with a finite number of parameters, <math>\Theta_{t}</math> and <math>\lambda</math> is the learning rate. At iteration <math>t</math>, tree <math>T_{t}(x;\sigma)</math> is induced to fit the negative gradient by least squares. That is |
<math>\hat{\Theta}:=\operatorname{argmin}_{\beta}\overset{N}{\underset{i}{\sum}}(-G_{it}-\beta_{t}T_{t}(x_{i});\Theta)^{2}</math> | <math>\hat{\Theta}:=\operatorname{argmin}_{\beta}\overset{N}{\underset{i}{\sum}}(-G_{it}-\beta_{t}T_{t}(x_{i});\Theta)^{2}</math> | ||
Line 13: | Line 13: | ||
The optimal weights of trees <math>\beta_{t}</math> are determined by | The optimal weights of trees <math>\beta_{t}</math> are determined by | ||
− | <math> | + | <math>\beta_{t}=\operatorname{argmin}_{\beta}\overset{N}{\underset{i}{\sum}}L(y_{i},f_{t-1}(x_{i})+\beta T(x_{i},\theta))</math> |
− | + | ''Source: [[Dong et al WWW 2010]]'' |
Latest revision as of 12:01, 29 March 2011
GBDT is an additive regression algorithm consisting of an ensemble of trees, fitted to current residuals, gradients of the loss function, in a forward step-wise manner. It iteratively fits an additive model as
such that a certain loss function is minimized, where is a tree at iteration , weighted by parameter , with a finite number of parameters, and is the learning rate. At iteration , tree is induced to fit the negative gradient by least squares. That is
where is the gradient over current prediction function
The optimal weights of trees are determined by
Source: Dong et al WWW 2010