An introduction of MLEM (maximum likelihood expectation maximization)
algorithm for medical imaging reconstruction (emission tomography such as PET and SPECT). The MLEM algorithm for emmision image
reconstruction was originally developed by
Shepp and Vardi in 1982.
This blog will explain this method under the expectation maximization framework
that we introduced in the previous post.- Posted on Feb 12, 2017
1. Problem Setup
In PET, patients are injected compounds containing radioactive isotopes. Those radioactive compounds form an unknown emitter density inside the body of the patient. Detectors are placed around the patient to capture signals ( gamma-rays ) emitted by that emitter. The ultimate goal is to estimate the emitter density based on measurements from detectors. It can be rephrased as follows:
Unknown emitter density: $\lambda(x,y,z) \geq 0$
Radiation emission occurred in Poisson distribution.
The total number of coincidences counted in the d'th detector: $n^*(d), d = {1, \dots, D}$
Goal: estimate $\lambda(x,y,z)$ from $n^*=\{n^*(1),\dots,n^*(D)\}$
2. MLEM Algorithm
2.1 Statistical Model
We discretize the density $\lambda(x,y,z)$ into voxels $b = \{1, \dots, B\}$ such that in voxel $b$ we have $\lambda(b)$. We further use $n(b)$ to denote the unknown count inside box $b$. Because of the Poisson property, we have:
\begin{equation}
P(n(b)=k;\lambda) = e^{-\lambda(b)}\frac{\lambda(b)^k}{k!}
\end{equation}
\begin{equation}
\lambda(b) = En(b)
\end{equation}
We also suppose that the following conditional probability is known:
\begin{equation}
p(b,d) = P(\ detected\ in\ detector\ d\ |\ emitted\ in\ voxel\ b)
\end{equation}
Till now, we have finished setting up basic statistical model structure. Here are a few more discussions:
$n(b)$: unknown counts in voxel $b$
$n^*(d)$: measured counts in detector $d$
$n(b,d)$: unknown counts that are emitted by voxel $b$ and detected by detector $d$
$\lambda(b) = En(b)$: unknown emitter density in voxel $b$
$\lambda^*(d) = En^*(d)$: mean measured counts in detector $d$
$\lambda(b,d) = En(b,d)$: mean unknown counts that emitted by voxel $b$ and detected by detector $d$
During this experiment, we can only observe $n^*$. There are many possible combinations of $n(b,d)$ inside the volume of interest that can finally lead to $n^*$. Let's define one possible collection of $n(b,d)$ that satisfies $n^*$ to be $a$:
\begin{equation}
a = \{n(b,d): \sum_{b=1}^Bn(b,d)=n^*(d)\}
\end{equation}
As discussed above, there are many possible choices of $n(b,d)$ that can lead to $n^*$. Thus, we have many $a$'s, and we index them as $a_1, a_2, \dots$. Define $A$ as the set that contains all possible radiation emission situations:
\begin{equation}
A = \{a_1, a_2, \dots\}
\end{equation}
Note that $A$ might contain a very large number of $a$'s. Then we have the following log-likelihood:
\begin{eqnarray}
logP(n^*;\lambda) = log[\sum_{a \in A} \prod_{b,d}e^{-\lambda(b,d)}\frac{\lambda(b,d)^{n(b,d)}}{n(b,d)!}]
\end{eqnarray}
Suppose we choose an arbitrary probability distribution over $A$ as $q(a), a \in A$, we have:
\begin{eqnarray}
&logP(n^*;\lambda)\\
&= log[\sum_{a \in A} q(a)\frac{ \prod_{b,d}e^{-\lambda(b,d)}\frac{\lambda(b,d)^{n(b,d)}}{n(b,d)!}}{q(a)}] \\
&=log E_A[\frac{ \prod_{b,d}e^{-\lambda(b,d)}\frac{\lambda(b,d)^{n(b,d)}}{n(b,d)!}}{q(a)}]\\
&\geq E_A[\sum_{b,d} log (e^{-\lambda(b,d)}\frac{\lambda(b,d)^{n(b,d)}}{n(b,d)!})-log\ q(a)]\\
&=\sum_{a \in A}[q(a)\sum_{b,d} log (e^{-\lambda(b,d)}\frac{\lambda(b,d)^{n(b,d)}}{n(b,d)!})-q(a)\ log\ q(a)]
\end{eqnarray}
We want to find the best $\lambda$ that maximizes $logP(n^*;\lambda)$. This can be achieved by maximizing equation (15) as we just dicussed in the previous post.
The final object function to maximize is $l(\lambda)$:
\begin{eqnarray}
&l(\lambda)\\
&=\sum_{a \in A}[q(a)\sum_{b,d} log (e^{-\lambda(b,d)}\frac{\lambda(b,d)^{n(b,d)}}{n(b,d)!})]\\
&=\sum_{a \in A}[q(a)\sum_{b,d} log (e^{-\lambda(b)p(b,d)}\frac{\lambda(b)^{n(b,d)}p(b,d)^{n(b,d)}}{n(b,d)!})]
\end{eqnarray}
In order to find $\lambda$ that maximize previous equation, we will use gradient ascent. In each iteration, the gradient of $l(\lambda)$ at current $\lambda$ will be calculated, and the new $\lambda$ will be calculated based on current $\lambda$ and the gradient:
\begin{equation}
\lambda^{new}(b_0) = \lambda^{old}(b_0) + \lambda^{old}(b_0) \frac{\partial l(\lambda)}{\partial \lambda^{old}(b_0)}
\end{equation}
The gradient of $l(\lambda)$ is calculated as:
\begin{eqnarray}
&\frac{\partial l(\lambda)}{\partial \lambda(b_0)}= -1 + \sum_{d=1}^D\frac{n^*(d)P(b_0, d)}{\sum_{b'=1}^B\lambda(b')p(b',d)}
\end{eqnarray}
And we have the final updating rule:
\begin{equation}
\label{updating-rule}
\lambda^{new}(b_0) = \lambda^{old}(b_0) \sum_{d=1}^D\frac{n^*(d)P(b_0, d)}{\sum_{b'=1}^B\lambda(b')p(b',d)}
\end{equation}
This updating rule is consistent with equation (2.13) from Shepp and Vardi in 1982.