Difference between revisions of "L-BFGS"
(Created page with 'This [[Category::Method | method]] page will be completed shortly by Avneesh.') |
|||
Line 1: | Line 1: | ||
− | + | === Citation === | |
+ | |||
+ | === Summary === | ||
+ | |||
+ | The L-BFGS algorithm is an optimization [[Category::Method | method]] that falls under the group of techniques known as "quasi-Newton" optimization methods. The "L" stands for "Limited" (in the limited memory sense, not that the method is necessarily limited), and BFGS are the individuals who came up with the original (non-limited memory variant) algorithm: Broyden, Fletcher, Goldfarb, and Shanno. Here is a picture of all of them: | ||
+ | |||
+ | [[File:bfgscharacters.png]] | ||
+ | |||
+ | In general, whereas optimization methods like the cutting planes algorithm, which are in the category of "bundle methods", essentially lower bound the objective function using linear gradients, quasi-Newton algorithms use the gradients to estimate the Hessian (the square matrix of the second-order partial derivatives, think of it as the vector generalization of a second derivative). With the gradient and an approximated Hessian, we can build a quadratic approximation of the objective function. | ||
+ | |||
+ | === Details === | ||
+ | |||
+ | The standard BFGS approximates the objective function with a locally quadratic optimization: | ||
+ | <math> m_t (w) J(w_t) + \leftangle \nabla J(w_t), w-w_t \rangle + \frac{1}{2} (w-w_t)^T H_t(w-w_t) </math> | ||
+ | where <math>J(w_t)</math> is the objective function, <math>\nabla J(w_t)</math> is the gradient, and <math>H_t</math> is the Hessian. | ||
+ | |||
+ | We update parameters with the weight that minimizes the quadratic approximation: | ||
+ | <math>w_{t+1} = \arg \min_w J (w_t) + \langle \nabla J(w_t), w_w_t /rangle + \frac{1}{2} (w-w_t)^T H_t (w-w_t) </math> | ||
+ | |||
+ | In Quasi-Newton methods, we use a Newton update to update the weight, in other words we use the Hessian and the gradient to come up with a direction in which to update: | ||
+ | <math>w_{t+1} = w_t -H_t^{-1} \nabla J(w_t)</math> | ||
+ | |||
+ | === Related Work === |
Revision as of 16:12, 31 October 2011
Contents
Citation
Summary
The L-BFGS algorithm is an optimization method that falls under the group of techniques known as "quasi-Newton" optimization methods. The "L" stands for "Limited" (in the limited memory sense, not that the method is necessarily limited), and BFGS are the individuals who came up with the original (non-limited memory variant) algorithm: Broyden, Fletcher, Goldfarb, and Shanno. Here is a picture of all of them:
In general, whereas optimization methods like the cutting planes algorithm, which are in the category of "bundle methods", essentially lower bound the objective function using linear gradients, quasi-Newton algorithms use the gradients to estimate the Hessian (the square matrix of the second-order partial derivatives, think of it as the vector generalization of a second derivative). With the gradient and an approximated Hessian, we can build a quadratic approximation of the objective function.
Details
The standard BFGS approximates the objective function with a locally quadratic optimization: Failed to parse (unknown function "\leftangle"): {\displaystyle m_t (w) J(w_t) + \leftangle \nabla J(w_t), w-w_t \rangle + \frac{1}{2} (w-w_t)^T H_t(w-w_t) } where is the objective function, is the gradient, and is the Hessian.
We update parameters with the weight that minimizes the quadratic approximation: Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. TeX parse error: Double subscripts: use braces to clarify"): {\displaystyle w_{t+1}=\arg \min _{w}J(w_{t})+\langle \nabla J(w_{t}),w_{w}_{t}/rangle+{\frac {1}{2}}(w-w_{t})^{T}H_{t}(w-w_{t})}
In Quasi-Newton methods, we use a Newton update to update the weight, in other words we use the Hessian and the gradient to come up with a direction in which to update: