    Next: On Generalization Up: Analysis of the New Previous: A Fast Network

## Learning Problems

Splitting-up the training set into subsets can also bring problems. The number of equal input vectors with different possible output values may increase, especially for modules with a small number of input variables. Consider the worst case: a 4-Bit-Parity problem:

 Original Set Set MLP1 Set MLP2 $x$1 x2 x3 x4 $y$ $x$1 x2 $y$ $x$3 x4 $y$ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 1 0 0 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 0

Assuming two input modules; MLP1 connected to $x$1 and $x$2, MLP2 connected to $x$3 and $x$4. The resulting training set for each of the two input modules has 16 2-Bit vectors, each of the four different vectors appearing twice with the desired output `1' and twice with the desired output `0'. It is impossible for the individual modules to distinguish these cases, so after training the module response will be `0.5' for any input pattern. All the information has been lost in the input layer, so no decision is possible.

To discuss this problem more general the following definition is necessary. $P(y=a)$ is the probability of the output variable $y$ having the value $a$. And $P(y=a | x=b)$ is the conditional probability of $y=a$ if $x=b$.

Definition: A Set of Statistically Neutral Variables
Consider the function $f(x$1, &ldots;, xn, &ldots;, xm)=y. A subset of the input ${x$1, &ldots;, xn} with $n is called statistically neutral if the knowledge of the values of these variables does not increase the knowledge of the result $y$.

Formally: The set of input variables ${x$1, &ldots;, xn} is statistically neutral if:

$P(y=a) = P(y=a | x$1 = b1 ∧&ldots;∧xn = bn)

If the whole set of input variables and all possible subsets for all modules are statistically neutral, the network can not learn the task. If only some of the modules are supplied with a statistically neutral set of input variables the network may perform satisfactory.

In [ston95] it is shown that monolithic multilayer feedforward networks can learn statistically neutral problems but they can not generalize on them.

This situation sounds very unlikely in real world data, and the expectation was observed during experimentation.

Especially in tasks with a large input space, such as picture recognition, this problem can be ignored. In tasks with a small number of input attributes, one way to reduce the possibility of having statistically neutral input sets is to recode the data in a sparse code.    Next: On Generalization Up: Analysis of the New Previous: A Fast Network

Albrecht Schmidt
Mit Okt 4 16:45:34 CEST 2000