# AutoClass --- a Bayesian approach to Classification

## John Stutz, Peter Cheeseman, Robin Hanson and Will Taylor

MS 269--2

NASA Ames Research Center

Moffett Field

CA 94035

USA

### Abstract

We describe a Bayesian approach to the untutored discovery of classes
in a set of cases, sometimes called finite mixture separation or
clustering. The main difference between clustering and our approach
is that we search for the ``best'' set of class descriptions rather than
grouping the cases themselves. We describe our classes in terms of a
probability distribution or density function, and the locally maximal
posterior probability valued function parameters. We rate our
classifications with an approximate joint probability of the data and
functional form, marginalizing over the parameters. Approximation is
necessitated by the computational complexity of the joint probability.
Thus we marginalize w.r.t. local maxima in the parameter space.
We discuss the rationale behind our approach to classification. We
give the mathematical development for the basic mixture model and
describe the approximations needed for computational tractability. We
instantiate the basic model with the discrete Dirichlet distribution
and multivariant Gaussian density likelihoods. Then we show some
results for both constructed and actual data.

MaxEnt 94 Abstracts / mas@mrao.cam.ac.uk