Basis Function Selection using Bayesian Inference for Nonlinear System Identification

Department of Automatic Control and Systems Engineering
University of Sheffield
S1 4DU


Many real physical processes exhibit complex nonlinear behaviour that is difficult to model. Typical examples are the weather pattern, economic indices and some chemical processes. Modelling or identifying the underlying process in some cases is approached from a nonparametric estimation viewpoint leading to the use of neural networks. The nonlinear system identification problem is posed as estimating an underlying nonlinear function f(.) which maps the input x on to an output y, i.e.,

y = f(x)

based on the input -- output observations {(x_n, y_n) | n = 1,...N}

Neural networks, radial basis functions and Volterra polynomials construct the underlying mapping f(x)

f(x) = \sum_{k=1}^{K} \alpha_k \phi_k(x)

where phi_k(x) are the basis functions constructed at the hidden layer and alpha_k are the coefficients. These basis functions are parametrised and typically are estimated during training. Estimation of these nonlinearly appearing parameters leads in general to increased training time. Furthermore, the number of basis functions used is critical in obtaining a good approximation to the underlying system, the problem being similar to the `overfitting' problem in interpolation.

The Bayesian framework developed for model comparison by Gull & Skilling was used by MacKay to demonstrate how different neural networks and models can be compared (at the second level of inference) and the most probable model chosen as the approximation to the underlying system. MacKay's examples included networks with different numbers of basis functions chosen a priori. MacKay's procedure is adopted in extending the scope by developing a method for exhaustive search over a wider class of models.

In this work, a pool of basis functions is first selected. A model will consist of basis functions which are a subset of this pool. These basis functions have fixed parameters and are not estimated during training. At the first level of inference, the coefficients are estimated for the model and at the second level, its evidence computed. A search is carried out over the space defined by the possible combinations of the pool of basis functions, using genetic algorithms, to determine the model with the largest evidence.

The procedure will yield the model with the optimal number of basis functions, but is limited by the basis functions available in the pool. This is not so much a constraint for Volterra polynomials where terms up to order 2 are selected for the pool and for the radial basis functions whose centres can be chosen from the observations. Furthermore, the computation at the first level of inference is simplified since only the linear coefficients need be estimated.

The application of the above procedure on the data generated by a large pilot scale liquid level nonlinear system will be demonstrated with the Volterra polynomial model and the radial basis function network.

MaxEnt 94 Abstracts /