Difference between revisions of "Belief Propagation"

Revision as of 11:16, 27 September 2011

This is a method proposed by Judea Pearl, 1982: Reverend Bayes on inference engines: A distributed hierarchical approach, AAAI 1982.

Belief Propagation (BP) is a message passing inference method for statistical graphical models (e.g. Bayesian networks and Markov random fields). The basic idea is to compute the marginal distribution of unobserved nodes, based on the conditional distribution of observed nodes. There are two major cases:

When the graphical model is both a factor graph and a tree (no loops), the exact marginals can be obtained. This is also equivalent to dynamic programming and Viterbi.
Otherwise, loopy Belief Propagation will become an approximation inference algorithm.

Motivation: Marginals vs. Joint Maximizer

To compute marginals, we need to find:

P(x_{1}),P(x_{2}),P(x_{3}),...,P(x_{N})

where as to compute joint maximum likelihood, we need:

{\underset {x_{1},x_{2},x_{3},...,x_{N}}{\operatorname {argmax} }}\ P(x_{1},x_{2},x_{3},...,x_{N}).

Unfortunately, for each random variable $X_{i}$ , it might have M possible states, so if we run search algorithms for all states, the complexity is $O(M^{N})$ , which is a computationally hard problem. As a result, we need to find better inference algorithms to solve the above problems.

Problem Formulation

In a generalized Markov random fields (MRFs), the marginals of the log-likelihood model can be formalized as the following equation:

P(X_{i}=x)={\frac {1}{Z}}\sum _{x_{1},...x_{N}}\Phi (x_{i},x)\prod _{(i,j\in E)}\chi _{i,j}(x_{i},x_{j})

where as the partition function is:

Z=\sum _{x_{1}...x_{N}}\prod _{(i,j\in E)}\chi _{i,j}(x_{i},x_{j})

Therefore, the two tasks here are: (1)compute the marginals of $P(X_{i}=x)$ . (2)compute the partition function.

The Belief Propagation Algorithm

In this example, we first show a simple case where the graph is a tree, then we also show the general form of BP messages. Assume we have the following tree-structured MRF, and we pick $\mathbf {Z}$ as the root node (technically, you can choose any node).

(1)In the first step, we compute the partition function by sending messages from leaves to the top of the tree. The message passing process can be recognized as bottom-up dynamic programming as well. For example, to compute the possible value of $G$ in message $m_{A,G}(G)$ from node A to node G, we can calculate

m_{A,G}(G)=\sum _{x_{A}}\Phi _{A,G}(x_{A},x_{G})

following the same method above, we can also calculate $m_{D,G}(G)$ and $m_{M,P}(P)$ . Next, we can calculate

m_{G,Z}(Z)=\sum _{x_{G}}\Phi _{G,Z}(x_{G},x_{Z})m_{A,G}(G)m_{D,G}(G)

m_{P,Z}(Z)=\sum _{x_{P}}\Phi _{P,Z}(x_{P},x_{Z})m_{M,P}(P)

and the partition function will become

Z=\sum _{x_{Z}}m_{G,Z}(Z)m_{P,Z}(Z)

If we define $Neighbor(i)$ to be the set of neighbors of $i$ , the general form of the message can be defined as

m_{i,j}(i)=\sum _{x_{i}}\Phi _{i,j}(x_{i},x_{j})\prod _{n\in Neighbor(i),n\neq j}m_{n,i}(x_{i})

when $Neighbor(i)={j}$ , then

m_{i,j}(j)=\sum _{x_{i}}\Phi _{i,j}(x_{i},x_{j})

(2)In the second step, we compute marginals by sending top-down messages through out the tree MRF, using the same definition as above.

m_{i,j}(i)=\sum _{x_{i}}\Phi _{i,j}(x_{i},x_{j})\prod _{n\in Neighbor(i)}m_{n,i}(x_{i})

for example, when calculating the downward message from Z to G, we can compute:

m_{Z,G}(G)=\sum _{x_{Z}}\Phi _{Z,G}(x_{Z},x_{G})m_{P,Z}(Z)

we then do the same to compute all the downward messages. After we obtain all the bottom-up and top-down messages, we then can easily compute the marginals. For example, if we want to compute $P(x_{G}=\epsilon )$ , we can

P(x_{G}=\epsilon )={\frac {1}{Z}}m_{A,G}(\epsilon )m_{D,G}(\epsilon )m_{Z,G}(\epsilon )

the general form of above calculation can be expressed as

P(x_{i}=\epsilon )={\frac {1}{Z}}\prod _{i\in Neighbor(i)(\epsilon )}

@@ Line 48: / Line 48: @@
 we then do the same to compute all the downward messages. After we obtain all the bottom-up and top-down messages, we then can easily compute the marginals. For example, if we want to compute <math>P(x_G = \epsilon)</math>, we can
 : <math> P(x_G = \epsilon) = \frac{1}{Z} m_{A,G}(\epsilon) m_{D,G}(\epsilon) m_{Z,G}(\epsilon)</math>
+the general form of above calculation can be expressed as
+: <math> P(x_i = \epsilon) = \frac{1}{Z} \prod_{i \in Neighbor(i)(\epsilon)}</math>

Difference between revisions of "Belief Propagation"

Revision as of 11:16, 27 September 2011

Motivation: Marginals vs. Joint Maximizer

Problem Formulation

The Belief Propagation Algorithm

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools