## Stephen P Luttrell Adaptive Systems Theory Section Defence Research Agency St Andrews Rd Malvern Worcestershire WR14 3PS United Kingdom luttrell@signal.dra.hmg.gb

### Abstract

This paper presents, from a maximum entropy perspective, the theory of a novel way of building adaptive joint density models, called the adaptive cluster expansion.

Conventional density models use hidden variables (e.g. hyperparameters) to model variables that are not directly observable. This approach has been used successfully to model the probability density of data that has complicated joint statistics, especially in image processing applications. In physics applications the use of hidden variables is mandatory because they are real physical quantities which happen not to be observed directly. However their use in image processing applications is usually empirical, because they serve to augment the parameter space of what is after all only an empirical model. There is room for alternative approaches in image processing (and other) applications.

In this paper it will be assumed that the data pixels have been passed through a multi-layer network of non-linear processors (e.g. a neural network), and that the marginal PDFs between the outputs of some of the non-linear processors have been measured. The goal is to construct an optimal model of the joint density of the data pixels, which requires that the following two problems be addressed:

1. What is the optimal way to combine the marginal PDFs to produce the required density model?
2. What is the optimal choice of non-linear processors to use in the multi-layer network in the first place?
The solutions to the above problems are as follows:
1. The maximum entropy principle will be used to construct a model of the joint density of the data pixels that is consistent with the supplied marginal PDFs. In general, this maximum entropy problem cannot be solved in closed-form, but in the case of a tree-like network topology it yields a closed-form expression that allows direct computation of the density (i.e. there are no hidden variables that need to be summed over).
2. There is no uniquely best choice for the non-linear processors that should be used in the network, but the consequences of using an objective function that maximises the relative entropy between the density model and the real-world density (which is equivalent to maximising the data likelihood with respect to the choice of non-linear processors) are derived. This objective function has a pleasing interpretation as the sum of the mutual informations between various outputs of processors in different parts of the network. During this optimisation process control signals flow backwards between the network layers to co-ordinate their optimisation; this effect is called self-supervision.
The tree-like network assumption that was needed to solve problem 1 in closed-form can be circumvented. If the network topology is the union of a number of {\em overlapping} tree-like subnetworks, then a number of closed-form maximum entropy solutions can be constructed --- one for each tree-like subnetwork --- but these cannot be combined to give the maximum entropy solution for the whole network (unlike the case of {\em separate} tree-like networks). As before, there is no unique solution to problem 2, but a practically useful one is to select that choice of non-linear processors that maximises the {\em average} of the relative entropies for each tree-like subnetwork. Because of their mutual overlap, the various tree-like subnetworks in the union compete with each other, and the global optimum for the choice of non-linear processors is forced to be a compromise.

In conclusion, it is possible to construct density models that have the ability to capture complicated joint statistics without the need to introduce hidden variables that subsequently have to be summed over. The most general model of this class can be represented as a multi-layer network of non-linear processors, whose connection topology is the union of a number of overlapping tree-like subnetworks.

MaxEnt 94 Abstracts / mas@mrao.cam.ac.uk