Difference between revisions of "Gibbs sampling"

Latest revision as of 00:32, 3 November 2011

This Method is used to sample from the stable joint distribution of two or more random variables when accurate computation of the integral or a marginal is intractable. Usually some variables in this set of random variables are the actual observables and hence there values need not be sampled in the iterations. This form of approximate inference method is generally used when doing posterior probability inference in probabilistic graphical models where computation of marginals are intractable.

Motivation

Gibbs sampling was introduced in the context of image processing by Geman and Geman[1]. The Gibbs sampler is a technique for generating random variables from a (marginal) distribution indirectly, without having to calculate the density[2]. Thus, if we are given with conditional densities $f(x_{i}|x_{(-i)})=f(x_{i}|x_{1},\cdots ,x_{i-1},x_{i+1},\cdots ,x_{K})$ , we can use Gibbs sampling to calculate the marginal distributions $f(x_{i})$ or any other function of $x_{i}$ .

Algorithm

1. Take some initial values $X_{k}^{(0)},k=1,2,\cdots ,K.$

2. Repeat for $t=1,2,\cdots ,$ :

For $k=1,2,\cdots ,K{\mbox{ generate }}X_{k}^{(t)}{\mbox{ from }}f(X_{k}^{(t)}|X_{1}^{(t)},\cdots ,X_{k-1}^{(t)},X_{k+1}^{(t-1)},\cdots X_{K}^{(t-1)})$

3. Continue step 2 until joint distribution of $f(X_{1}^{(t)},\cdots ,X_{K}^{(t)})$ doesn't change.

Under regularity conditions, it can be shown that this procedure eventually stabilizes, and the resulting random variables are indeed a sample from $X_{1},\cdots ,X_{K}$

Convergence

A Simple proof for bivariate case

Consider a bivariate system of bernoulli random variables $X$ and $Y$ . Define two matrices $A_{x|y}$ and $A_{y|x}$ such that $A_{x|y}(i,j)=f(x=i|y=j)$ and $A_{y|x}(i,j)=f(y=i|x=j)$ . Then the transition probability of $X$ to $X$ can be given as $A_{x|x}=A_{x|y}A_{y|x}$ . If the initial distribution to start with was $f_{0}=[f_{0}(1),f_{0}(2)]^{T}$ , then at the $k^{th}$ iteration, $f_{k}=f_{0}A_{x|x}^{k}$ . It is well known that as $k\rightarrow \infty$ , $f_{k}$ approaches a stationary point. The stationary point represents $f(X)$ .

Burnout

Gibbs sampler requires certain number of iterations until it approaches the stationary state, and generate samples from the marginal distribution. To allow this, first few samples (typically in the order of 500-1000) are discarded. This is known as burnout.

Detecting Convergence

For detecting if the sampling has approached the stationary state (the burnout parameter), various strategies have been proposed. Gelfand and Smith, 1990 suggest monitoring density estimates from $m$ independent generated samples, and choosing the first point in time when these densities appear to be the same. Another strategy is to monitor a sequence of weights that measure by how much the sampled and desired distribution differ. However, none of these strategies apply to all kinds of distributions.

Similarity with other methods

Metropolis-Hastings

Gibbs sampling is a special case of Metropolis-Hastings algorithm where the proposal distribution $Q$ is the conditional distribution and every sample is accepted.

Expectation Maximization

Gibbs sample can be observed as a variant of the EM algorithm for exponential models. The required connection is to view the latent variables as parameters to the Gibbs sampler, and instead of maximizing; sample from the conditional distribution.

References

1. Geman and Geman

2. http://biostat.jhsph.edu/~mmccall/articles/casella_1992.pdf

3. Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning.

@@ Line 1: / Line 1: @@
-Gibbs sampling is used to sample from the stable joint distribution of two or more random variables when accurate computation of the integral or a marginal is intractable. Usually some variables in this set of random variables are the actual observables and hence there values need not be sampled in the [[UsesMethod :: Gibbs sampling]] iterations. This form of approximate inference method is generally used when doing posterior probability inference in probabilistic graphical models where computation of marginals are intractable.
+This [[Category::Method]] is used to sample from the stable joint distribution of two or more random variables when accurate computation of the integral or a marginal is intractable. Usually some variables in this set of random variables are the actual observables and hence there values need not be sampled in the iterations. This form of approximate inference method is generally used when doing posterior probability inference in probabilistic graphical models where computation of marginals are intractable.
 == Motivation ==
@@ Line 17: / Line 17: @@
 Under regularity conditions, it can be shown that this procedure eventually stabilizes, and the resulting random variables are indeed a sample from <math>X_1, \cdots, X_K</math>
-== A Simple proof of Convergence for bivariate case==
+== Convergence ==
+=== A Simple proof for bivariate case ===
 Consider a bivariate system of bernoulli random variables <math>X</math> and <math>Y</math>. Define two matrices <math>A_{x|y}</math> and <math>A_{y|x}</math> such that <math>A_{x|y}(i, j) = f(x=i | y=j)</math> and <math>A_{y|x}(i, j) = f(y=i | x=j)</math>. Then the transition probability of <math>X</math> to <math>X</math> can be given as <math>A_{x|x} = A_{x|y} A_{y|x}</math>. If the initial distribution to start with was <math>f_0 = [f_0(1), f_0(2)]^T</math>, then at the <math>k^{th}</math> iteration, <math>f_k = f_0 A_{x|x}^k</math>. It is well known that as <math>k \rightarrow \infty</math>, <math>f_k</math> approaches a stationary point. The stationary point represents <math>f(X)</math>.
-== Burnout ==
+=== Burnout ===
 Gibbs sampler requires certain number of iterations until it approaches the stationary state, and generate samples from the marginal distribution. To allow this, first few samples (typically in the order of 500-1000) are discarded. This is known as burnout.
+=== Detecting Convergence ===
+For detecting if the sampling has approached the stationary state (the burnout parameter), various strategies have been proposed. [[RelatedPaper::Gelfand and Smith, 1990]] suggest monitoring density estimates from <math>m</math> independent generated samples, and choosing the first point in time when these densities appear to be the same. Another strategy is to monitor a sequence of weights that measure by how much the sampled and desired distribution differ. However, none of these strategies apply to all kinds of distributions.
+== Similarity with other methods ==
+=== Metropolis-Hastings ===
+Gibbs sampling is a special case of [[Metropolis-Hastings]] algorithm where the proposal distribution <math>Q</math> is the conditional distribution and every sample is accepted.
+== Expectation Maximization ==
+Gibbs sample can be observed as a variant of the [[Expectation Maximization | EM]] algorithm for exponential models. The required connection is to view the latent variables as parameters to the Gibbs sampler, and instead of maximizing; sample from the conditional distribution.
 == References ==

Difference between revisions of "Gibbs sampling"

Latest revision as of 00:32, 3 November 2011

Contents

Motivation

Algorithm

Convergence

A Simple proof for bivariate case

Burnout

Detecting Convergence

Similarity with other methods

Metropolis-Hastings

Expectation Maximization

References

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools