Department of Automatic Control and Systems Engineering

University of Sheffield

Sheffield

S1 4DU

U.K.

**y = f(x)**

based on the input -- output observations
**{(x_n, y_n) | n = 1,...N}**

Neural networks, radial basis functions and Volterra polynomials
construct the underlying mapping **f(x)**

where **phi_k(x)** are the basis functions constructed at
the hidden layer and **alpha_k**
are the coefficients. These basis functions are parametrised and
typically are estimated during training. Estimation of these
nonlinearly appearing parameters leads in general to increased
training time. Furthermore, the number of basis functions used is
critical in obtaining a good approximation to the underlying system,
the problem being similar to the `overfitting' problem in
interpolation.

The Bayesian framework developed for model comparison by Gull &
Skilling was used by MacKay to demonstrate how different neural
networks and models can be compared (at the second level of inference)
and the most probable model chosen as the approximation to the
underlying system. MacKay's examples included networks with different
numbers of basis functions chosen *a priori*. MacKay's procedure is
adopted in extending the scope by developing a method for exhaustive
search over a wider class of models.

In this work, a pool of basis functions is first selected. A model will consist of basis functions which are a subset of this pool. These basis functions have fixed parameters and are not estimated during training. At the first level of inference, the coefficients are estimated for the model and at the second level, its evidence computed. A search is carried out over the space defined by the possible combinations of the pool of basis functions, using genetic algorithms, to determine the model with the largest evidence.

The procedure will yield the model with the optimal number of basis functions, but is limited by the basis functions available in the pool. This is not so much a constraint for Volterra polynomials where terms up to order 2 are selected for the pool and for the radial basis functions whose centres can be chosen from the observations. Furthermore, the computation at the first level of inference is simplified since only the linear coefficients need be estimated.

The application of the above procedure on the data generated by a large pilot scale liquid level nonlinear system will be demonstrated with the Volterra polynomial model and the radial basis function network.

MaxEnt 94 Abstracts / mas@mrao.cam.ac.uk