# Partitioned mixture distributions

## Stephen P Luttrell

Adaptive Systems Theory Section

Defence Research Agency

St Andrews Rd

Malvern

Worcestershire

WR14 3PS

United Kingdom

`luttrell@signal.dra.hmg.gb`

### Abstract

A mixture distribution is defined as **P(x) = \sum
P(x|c)P(c)**, where **x** is a state vector,
**c** is a class label, **P(c)** is a prior,
and **P(x|c)** is a class conditional PDF. There are many
ways of optimising the **P(c)** and the
**P(x|c)** to produce a **P(x)** with some
desired properties. A popular scheme is the maximum likelihood
prescription, in which **<\log P(x)>** is maximised with
respect to the free parameters in **P(c)** and
**P(x|c)**. When the **P(x|c)** are Gaussian
PDFs this optimisation prescription is particularly easy to implement
either as a gradient ascent or as an expectation-maximisation (EM)
algorithm. However, the flexibility of the standard mixture
distribution approach is limited, and it does not lead to useful
results in spaces where dim **x** is large.
In this paper a layered network is introduced that allows dim
**x** to be
large, and which ``tiles'' **x**-space with many mixture
distributions, each of which deals with a low-dimensional
subspace. Thus **x** is partitioned into overlapping
regions (**x_1, x_2, \ldots , x_n)**, and a corresponding
set of mixture distributions **P_i(x_i)** is defined
whose parameter spaces are forced to overlap --- this type of network
is called a ``partitioned mixture distribution'' (PMD).

For a standard mixture distribution posterior probabilities are
computed from the **P(x|c)** and the
**P(c)** by using Bayes' theorem in the form
**P(c|x) = P(x|c) P(c) / \sum P(x|c') P(c')**,
but for a PMD the corresponding quantity is the average posterior
probability which can be written in the form **P(x|c) P(c) \sum
(1 / \sum (x|c') P(c'))**; in a PMD each mixture
uses only a small partition of **c**-space so it cannot
construct a full posterior probability over all
**c**-space.

The EM algorithm for optimising a standard mixture distribution may be
modified to a version that is suitable for a PMD by simply replacing
the standard posterior probability with the *average* posterior
probability. As a PMD is optimised it adjusts its parameters to ensure
that each of its embedded mixture distributions is optimised as far as
possible consistent with the fact that it must share parameters with
its neighbouring mixture distributions.

A mixture distribution can be converted into a hidden Markov model
(HMM) by allowing its current prior **P(c;t)** to be
determined by its *previous* posterior probability
**P(c|x;t-1)**. Similarly, a PMD can also be generalised
to become multiple overlapping HMMs.

MaxEnt 94 Abstracts / mas@mrao.cam.ac.uk