Function estimation: MaxEnt and beyond

John Skilling
University of Cambridge, Cavendish Laboratory
Madingley Road, England CB3 0HE

Abstract

A quantity of interest is distributed as \phi(x), measured by data D which constrain integral properties of it: what is \phi? Although the context shows \phi to be a non-negative density, it is in general nonparametric (or freeform) in the sense that it may not be assumed to admit a reasonable parametrization in terms of a limited number of parameters. One can either seek to assign a single optimal \widehat \phi, or to construct the posterior probability \Pr(\phi|D) which defines the range of plausible results. The prior and posterior probabilities \Pr(\phi) and \Pr(\phi|D) have to be defined over the very large space of all non-negative functions \phi rather than some parameter space of limited dimension.

Maximum entropy is the proper way of assigning a single density consistently with integral constraints.

Bayesian analysis, though, repeatedly requires integration over \phi-space, thus demanding that the space be endowed with an integration measure. For analytic and computational tractability we use a finite discretization, partitioning the relevant domain of x into M cells. Since the underlying problem is continuous, the measure on M cells must be set up in a manner consistent with passage to the continuum limit (M \rightarrow \infty) of continuous x. We derive the form of this condition, which leads naturally to an explicit power-law form for the integration measure per unit volume. The specification of this measure logically precedes the assignment of any probability functions, which appear as pointwise weighting factors in integrals over \phi-measure.

Quantified maximum entropy used the entropy S(\phi) to assign the prior probability required by Bayesian analysis as \exp(\alpha S), but this leads to an incompatibility with the continuum limit.

Instead, the prior \exp(-\beta \phi) relative to the power-law measure is the natural choice, having several desirable properties. The product of this prior and measure is commonly known as the gamma process. The further constraint of unit normalization on \phi immediately leads to the Dirichlet process. The use of such processes in Bayesian function estimation is effectively demanded by a consistency argument, regardless of the form of data to be analyzed. Specific applications are discussed by Sibusiso Sibisi in these proceedings.


MaxEnt 94 Abstracts / mas@mrao.cam.ac.uk