In this section the question of capabilities and limitations of MLFFNs is addressed. In the second part networks trained by the Backpropagation algorithm are considered in more detail.

It is known that MLPs are universal approximators. The *Kolmogorov
Theorem* states that any continuous function defined
on a closed n-dimensional cube can be rewritten as a
summation of applications of continuous function on one variable
[kolm57].

For all functions $f(x$_{1}, &ldots;, x_{n}) with $x$_{i} ∈[0,1] there are
continuous functions $\psi $_{i} and $\phi $_{ij} on one variable such that:

$f(x$_{1}, &ldots;, x_{n}) = ∑_{j=1}^{2n+1} ψ_{i}
( ∑_{i=1}^{n} φ_{ij} (x_{i}) )

However this theorem is an existence theorem only! The functions $\psi $_{i} and $\phi $_{ij}
are dependent on the mapping function $f$ and the theorem gives no help for
finding these functions [patt96, p182,] and [faus94, p328,].

The *Hecht-Nielsen Theorem* which is based on the
Kolmogorov theorem to shows
that any continuous function $f:InRm$ can be approximated by a
feedforward network with $n$ inputs, $2n+1$ hidden neurons, and $m$ output
nodes [hech87] and [hech89].

These theorems state that there is an appropriate network for such a
problem, but give no assistance in how to find it. In particular they give
no indication of a method to find the appropriate weights.
The Backpropagation algorithm is *one* such method.

Usually the problem given to the neural network is described as a set of examples rather than a function. Therefore the number of points in the problem space is finite. Neural networks which are trained properly have a good interpolation performance, but a very poor extrapolation performance. Unfortunately there is no general rule to find out if a new pattern is within the interpolation space [hell95].

To define a measure for the generalization ability is very difficult. A good generalization always depends on the data set, what for one application might be desirable is useless for another. There are some theoretical approaches to this issue ([hold95] and the references of that paper).

In [mraz95] a theoretical analysis of the robustness of BP networks is given. The paper proposes a definition for a separation characteristic which can be used to evaluate and compare BP networks. As a criteria for a robust network it requires that the network should respond with an output value `no decision possible' for all new input patterns which are close to the separating hyperplane.

In [ston95] J. V. Stone and C. J. Thorton ask the question: ``Can Artificial Neural Networks Discover Useful Regularities?''

In this paper they state the hypothesis that artificial neural networks trained with Backpropagation depend on correlations between input and output variables. They show that BP-MLPs have a very poor generalization ability for statistically neutral problems. Statistically neutral in this context means that no knowledge of the expected output value can be drawn from the knowledge of a single input variable. Only the relation between different input variables determines the output.

They suggest to using a sparse coding of the problem to supply the BP algorithm with correlated input variables and show that this approach can improve the generalization ability of BP trained ANN for statistically neutral problems.

However real world problems are rarely totally statistically neutral. A more appropriate question is therefore: ``what kind of regularities can be discovered by an ANN?''

The idea to improve the generalization ability of a network by recoding the problem in a sparse code or binary (and therefore higher dimensional) seems very interesting.

Mit Okt 4 16:45:34 CEST 2000