AutoClass --- a Bayesian approach to Classification

John Stutz, Peter Cheeseman, Robin Hanson and Will Taylor
MS 269--2
NASA Ames Research Center
Moffett Field
CA 94035


We describe a Bayesian approach to the untutored discovery of classes in a set of cases, sometimes called finite mixture separation or clustering. The main difference between clustering and our approach is that we search for the ``best'' set of class descriptions rather than grouping the cases themselves. We describe our classes in terms of a probability distribution or density function, and the locally maximal posterior probability valued function parameters. We rate our classifications with an approximate joint probability of the data and functional form, marginalizing over the parameters. Approximation is necessitated by the computational complexity of the joint probability. Thus we marginalize w.r.t. local maxima in the parameter space.

We discuss the rationale behind our approach to classification. We give the mathematical development for the basic mixture model and describe the approximations needed for computational tractability. We instantiate the basic model with the discrete Dirichlet distribution and multivariant Gaussian density likelihoods. Then we show some results for both constructed and actual data.

MaxEnt 94 Abstracts /