Partitioned mixture distributions

Stephen P Luttrell
Adaptive Systems Theory Section
Defence Research Agency
St Andrews Rd
WR14 3PS
United Kingdom


A mixture distribution is defined as P(x) = \sum P(x|c)P(c), where x is a state vector, c is a class label, P(c) is a prior, and P(x|c) is a class conditional PDF. There are many ways of optimising the P(c) and the P(x|c) to produce a P(x) with some desired properties. A popular scheme is the maximum likelihood prescription, in which <\log P(x)> is maximised with respect to the free parameters in P(c) and P(x|c). When the P(x|c) are Gaussian PDFs this optimisation prescription is particularly easy to implement either as a gradient ascent or as an expectation-maximisation (EM) algorithm. However, the flexibility of the standard mixture distribution approach is limited, and it does not lead to useful results in spaces where dim x is large.

In this paper a layered network is introduced that allows dim x to be large, and which ``tiles'' x-space with many mixture distributions, each of which deals with a low-dimensional subspace. Thus x is partitioned into overlapping regions (x_1, x_2, \ldots , x_n), and a corresponding set of mixture distributions P_i(x_i) is defined whose parameter spaces are forced to overlap --- this type of network is called a ``partitioned mixture distribution'' (PMD).

For a standard mixture distribution posterior probabilities are computed from the P(x|c) and the P(c) by using Bayes' theorem in the form P(c|x) = P(x|c) P(c) / \sum P(x|c') P(c'), but for a PMD the corresponding quantity is the average posterior probability which can be written in the form P(x|c) P(c) \sum (1 / \sum (x|c') P(c')); in a PMD each mixture uses only a small partition of c-space so it cannot construct a full posterior probability over all c-space.

The EM algorithm for optimising a standard mixture distribution may be modified to a version that is suitable for a PMD by simply replacing the standard posterior probability with the average posterior probability. As a PMD is optimised it adjusts its parameters to ensure that each of its embedded mixture distributions is optimised as far as possible consistent with the fact that it must share parameters with its neighbouring mixture distributions.

A mixture distribution can be converted into a hidden Markov model (HMM) by allowing its current prior P(c;t) to be determined by its previous posterior probability P(c|x;t-1). Similarly, a PMD can also be generalised to become multiple overlapping HMMs.

MaxEnt 94 Abstracts /