Difference between revisions of "Belief Propagation"
(26 intermediate revisions by one other user not shown) | |||
Line 17: | Line 17: | ||
In a generalized Markov random fields (MRFs), the marginals of the log-likelihood model can be formalized as the following equation: | In a generalized Markov random fields (MRFs), the marginals of the log-likelihood model can be formalized as the following equation: | ||
:<math> P(X_i=x) = \frac{1}{Z} \sum_{x_1,...x_N} \Phi(x_i,x) \prod_{(i,j \in E)} \chi_{i,j}(x_i,x_j) </math> | :<math> P(X_i=x) = \frac{1}{Z} \sum_{x_1,...x_N} \Phi(x_i,x) \prod_{(i,j \in E)} \chi_{i,j}(x_i,x_j) </math> | ||
− | where as the partition function is: | + | here <math>\Phi(x_i,x)</math> is a binary function, where as the above partition function is: |
:<math> Z = \sum_{x_1...x_N} \prod_{(i,j \in E)} \chi_{i,j}(x_i,x_j) </math> | :<math> Z = \sum_{x_1...x_N} \prod_{(i,j \in E)} \chi_{i,j}(x_i,x_j) </math> | ||
Therefore, the two tasks here are: | Therefore, the two tasks here are: | ||
− | (1 | + | (1)compute the marginals of <math> P(X_{i}=x) </math>. |
− | + | (2)compute the partition function. | |
− | == | + | == The Belief Propagation Algorithm == |
− | In this example, we first show a simple case where the graph is a tree, then we | + | In this example, we first show a simple case where the graph is a tree, then we also show the general form of BP messages. |
Assume we have the following tree-structured MRF, and we pick <math>\mathbf{Z}</math> as the root node (technically, you can choose any node). | Assume we have the following tree-structured MRF, and we pick <math>\mathbf{Z}</math> as the root node (technically, you can choose any node). | ||
[[File:Binarytree.gif]] | [[File:Binarytree.gif]] | ||
− | In the first step, we compute the partition function by sending messages from leaves to the top of the tree. The message passing process can be recognized as bottom-up dynamic programming as well. For example, to compute the possible value of <math>G</math> in message <math>m_{A,G}(G)</math> from node A to node G, we can calculate | + | (1) In the first step, we compute the partition function by sending messages from leaves to the top of the tree. The message passing process can be recognized as bottom-up dynamic programming as well. For example, to compute the possible value of <math>G</math> in message <math>m_{A,G}(G)</math> from node A to node G, we can calculate |
: <math>m_{A,G}(G) = \sum_{x_A} \Phi_{A,G}(x_A,x_G)</math> | : <math>m_{A,G}(G) = \sum_{x_A} \Phi_{A,G}(x_A,x_G)</math> | ||
following the same method above, we can also calculate <math>m_{D,G}(G)</math> and <math>m_{M,P}(P)</math>. Next, we can calculate | following the same method above, we can also calculate <math>m_{D,G}(G)</math> and <math>m_{M,P}(P)</math>. Next, we can calculate | ||
Line 36: | Line 36: | ||
and the partition function will become | and the partition function will become | ||
: <math>Z = \sum_{x_Z} m_{G,Z}(Z) m_{P,Z}(Z)</math> | : <math>Z = \sum_{x_Z} m_{G,Z}(Z) m_{P,Z}(Z)</math> | ||
+ | If we define <math>Neighbor(i)</math> to be the set of neighbors of <math>i</math>, | ||
+ | the general form of the message can be defined as | ||
+ | : <math>m_{i,j}(i) = \sum_{x_i} \Phi_{i,j}(x_i,x_j) \prod_{n \in Neighbor(i), n \ne j}m_{n,i}(x_i)</math> | ||
+ | when <math>Neighbor(i) = {j}</math>, then | ||
+ | : <math>m_{i,j}(j) = \sum_{x_i} \Phi_{i,j}(x_i,x_j)</math> | ||
− | In the second step, we compute marginals by sending top-down messages through out the tree MRF, using | + | (2) In the second step, we compute marginals by sending top-down messages through out the tree MRF, using the same definition as above. |
− | : <math>m_{Z,G}(G) = \sum_{x_Z} \Phi_{Z,G}(x_Z,x_G) \prod_{ | + | : <math>m_{i,j}(i) = \sum_{x_i} \Phi_{i,j}(x_i,x_j) \prod_{n \in Neighbor(i)}m_{n,i}(x_i)</math> |
+ | for example, when calculating the downward message from Z to G, we can compute: | ||
+ | : <math>m_{Z,G}(G) = \sum_{x_Z} \Phi_{Z,G}(x_Z,x_G) m_{P,Z}(Z)</math> | ||
+ | we then do the same to compute all the downward messages. After we obtain all the bottom-up and top-down messages, we then can easily compute the marginals. For example, if we want to compute <math>P(x_G = \epsilon)</math>, we can | ||
+ | : <math> P(x_G = \epsilon) = \frac{1}{Z} m_{A,G}(\epsilon) m_{D,G}(\epsilon) m_{Z,G}(\epsilon)</math> | ||
+ | the general form of above calculation can be expressed as | ||
+ | : <math> P(X_i = x) = \frac{1}{Z} \prod_{j \in Neighbor(i)} m_{i,j}(x)</math> | ||
+ | == Some Reflections == | ||
+ | (1) If we use Bayesian networks to represent hidden Markov models (HMMs), | ||
+ | then the BP algorithm would be the forward-backward algorithm. | ||
− | == | + | (2) In terms of computational complexity, if we are working on a pair-wise MRF, |
+ | the complexity can be dropped from <math>O( M^N )</math> to <math>O( M^2 )</math>. | ||
+ | |||
+ | (3) Some people describe BP as '''Sum-Product''' method to estimate marginals. | ||
+ | There is a variant called '''Max-Product''' to estimate the maximum probability. | ||
+ | The general definition of message updates stays the same, but the <math>\sum_{x_i} \Phi_{i,j}(x_i,x_j)</math> will be replaced by <math>\max_{x_i} \Phi_{i,j}(x_i,x_j) </math> | ||
+ | |||
+ | == Related Papers == | ||
+ | * [[RelatedPaper::Smith and Eisner 2008:Dependency parsing by belief propagation]] (This is a recent paper that uses the BP algorithm for dependency parsing.) | ||
+ | |||
+ | * [[RelatedPaper::T. Meltzer et al., 2005: Globally Optimal Solutions for Energy Minimization in Stereo Vision using Reweighted Belief Propagation, ICCV 2005]] (This is a modified BP algorithm for computer vision applications.) | ||
+ | |||
+ | * [[RelatedPaper::R. Szeliski et al., 2008: A comparative study of energy minimization methods for Markov random fields with smoothness-based priors, IEEE Transactions on Pattern Analysis and Machine Intelligence 2008]] (This paper compares BP with some other energy minimization methods.) | ||
+ | |||
+ | * [[RelatedPaper::Smith and Eisner 2005:Contrastive Estimation: Training Log-Linear Models on Unlabeled Data]] (This is a very different estimation approach for parameter estimation in log-linear models.) | ||
+ | |||
+ | |||
+ | == Comments == | ||
+ | |||
+ | |||
+ | For lots of really interesting theory about loopy belief propagation, check out Wainwright and Jordan 2008, http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf . --[[User:Brendan|Brendan]] 21:12, 13 October 2011 (UTC) |
Latest revision as of 16:16, 13 October 2011
This is a method proposed by Judea Pearl, 1982: Reverend Bayes on inference engines: A distributed hierarchical approach, AAAI 1982.
Belief Propagation (BP) is a message passing inference method for statistical graphical models (e.g. Bayesian networks and Markov random fields). The basic idea is to compute the marginal distribution of unobserved nodes, based on the conditional distribution of observed nodes. There are two major cases:
- When the graphical model is both a factor graph and a tree (no loops), the exact marginals can be obtained. This is also equivalent to dynamic programming and Viterbi.
- Otherwise, loopy Belief Propagation will become an approximation inference algorithm.
Contents
Motivation: Marginals vs. Joint Maximizer
To compute marginals, we need to find:
where as to compute joint maximum likelihood, we need:
Unfortunately, for each random variable , it might have M possible states, so if we run search algorithms for all states, the complexity is , which is a computationally hard problem. As a result, we need to find better inference algorithms to solve the above problems.
Problem Formulation
In a generalized Markov random fields (MRFs), the marginals of the log-likelihood model can be formalized as the following equation:
here is a binary function, where as the above partition function is:
Therefore, the two tasks here are: (1)compute the marginals of . (2)compute the partition function.
The Belief Propagation Algorithm
In this example, we first show a simple case where the graph is a tree, then we also show the general form of BP messages. Assume we have the following tree-structured MRF, and we pick as the root node (technically, you can choose any node).
(1) In the first step, we compute the partition function by sending messages from leaves to the top of the tree. The message passing process can be recognized as bottom-up dynamic programming as well. For example, to compute the possible value of in message from node A to node G, we can calculate
following the same method above, we can also calculate and . Next, we can calculate
and the partition function will become
If we define to be the set of neighbors of , the general form of the message can be defined as
when , then
(2) In the second step, we compute marginals by sending top-down messages through out the tree MRF, using the same definition as above.
for example, when calculating the downward message from Z to G, we can compute:
we then do the same to compute all the downward messages. After we obtain all the bottom-up and top-down messages, we then can easily compute the marginals. For example, if we want to compute , we can
the general form of above calculation can be expressed as
Some Reflections
(1) If we use Bayesian networks to represent hidden Markov models (HMMs), then the BP algorithm would be the forward-backward algorithm.
(2) In terms of computational complexity, if we are working on a pair-wise MRF, the complexity can be dropped from to .
(3) Some people describe BP as Sum-Product method to estimate marginals. There is a variant called Max-Product to estimate the maximum probability. The general definition of message updates stays the same, but the will be replaced by
Related Papers
- Smith and Eisner 2008:Dependency parsing by belief propagation (This is a recent paper that uses the BP algorithm for dependency parsing.)
- T. Meltzer et al., 2005: Globally Optimal Solutions for Energy Minimization in Stereo Vision using Reweighted Belief Propagation, ICCV 2005 (This is a modified BP algorithm for computer vision applications.)
- R. Szeliski et al., 2008: A comparative study of energy minimization methods for Markov random fields with smoothness-based priors, IEEE Transactions on Pattern Analysis and Machine Intelligence 2008 (This paper compares BP with some other energy minimization methods.)
- Smith and Eisner 2005:Contrastive Estimation: Training Log-Linear Models on Unlabeled Data (This is a very different estimation approach for parameter estimation in log-linear models.)
Comments
For lots of really interesting theory about loopy belief propagation, check out Wainwright and Jordan 2008, http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf . --Brendan 21:12, 13 October 2011 (UTC)