In this section the question of capabilities and limitations of MLFFNs is addressed. In the second part networks trained by the Backpropagation algorithm are considered in more detail.
It is known that MLPs are universal approximators. The Kolmogorov Theorem states that any continuous function defined on a closed n-dimensional cube can be rewritten as a summation of applications of continuous function on one variable [kolm57].
For all functions with there are continuous functions and on one variable such that:
However this theorem is an existence theorem only! The functions and are dependent on the mapping function and the theorem gives no help for finding these functions [patt96, p182,] and [faus94, p328,].
The Hecht-Nielsen Theorem which is based on the Kolmogorov theorem to shows that any continuous function can be approximated by a feedforward network with inputs, hidden neurons, and output nodes [hech87] and [hech89].
These theorems state that there is an appropriate network for such a problem, but give no assistance in how to find it. In particular they give no indication of a method to find the appropriate weights. The Backpropagation algorithm is one such method.
Usually the problem given to the neural network is described as a set of examples rather than a function. Therefore the number of points in the problem space is finite. Neural networks which are trained properly have a good interpolation performance, but a very poor extrapolation performance. Unfortunately there is no general rule to find out if a new pattern is within the interpolation space [hell95].
To define a measure for the generalization ability is very difficult. A good generalization always depends on the data set, what for one application might be desirable is useless for another. There are some theoretical approaches to this issue ([hold95] and the references of that paper).
In [mraz95] a theoretical analysis of the robustness of BP networks is given. The paper proposes a definition for a separation characteristic which can be used to evaluate and compare BP networks. As a criteria for a robust network it requires that the network should respond with an output value `no decision possible' for all new input patterns which are close to the separating hyperplane.
In [ston95] J. V. Stone and C. J. Thorton ask the question: ``Can Artificial Neural Networks Discover Useful Regularities?''
In this paper they state the hypothesis that artificial neural networks trained with Backpropagation depend on correlations between input and output variables. They show that BP-MLPs have a very poor generalization ability for statistically neutral problems. Statistically neutral in this context means that no knowledge of the expected output value can be drawn from the knowledge of a single input variable. Only the relation between different input variables determines the output.
They suggest to using a sparse coding of the problem to supply the BP algorithm with correlated input variables and show that this approach can improve the generalization ability of BP trained ANN for statistically neutral problems.
However real world problems are rarely totally statistically neutral. A more appropriate question is therefore: ``what kind of regularities can be discovered by an ANN?''
The idea to improve the generalization ability of a network by recoding the problem in a sparse code or binary (and therefore higher dimensional) seems very interesting.