Next: The Implementation Up: Analysis of the New Previous: Learning Problems

## On Generalization

The ability to generalize is the main property of neural networks. This is how neural networks can handle inputs which have not been learned but which are similar to inputs seen during the training phase. The proposed architecture combines two methods of generalization.

One method is built-in to the MLP. Each of the networks has the ability to generalize on its input space. This type of generalization is common to connectionist systems.

The second method of generalization is due to the architecture of the proposed network. It is a way of generalizing according to the similarity of input patterns. This method is found in logical neural networks [patt96, p172ff,].

To explain the behaviour more concretely consider the following simplified example of a recognition system.

Figure 5.4:  An Example Architecture.

A 3x3 input retina with the architecture shown in Figure 5.4 is assumed. Each of the nine inputs reads a continuous value between zero and one, according to the recorded gray level (black=1; white=0).

The network needs to be trained to recognize the simplified letters `H' and `L', using the training set is shown in Figure 5.5(a). The desired output of the input networks is `0' for the letter `H' and `1' for the letter `L'.

Figure 5.5:     (a) The Training Set.
(b) The Test Set.

The training subsets for the networks MLP0, MLP1, and MLP2 are:

 MLP0 MLP1 MLP2 (1,0,1;0) (1,1,1;0) (1,0,1;0) (1,0,0;1) (1,0,0;1) (1,1,1;1)

After training of the first layer of networks it is assumed that the calculated output is equivalent to the desired output. The resulting training set for the decision network is:

$(Φ(1,0,1,1,1,1,1,0,1);1,0) = (0,0,0;1,0)$

$(Φ(1,0,0,1,0,0,1,1,1);0,1) = (1,1,1;0,1)$

After the training of the decision network the assumed response of the system to the training set is:

$r$H = Ψ(Φ(1,0,1,1,1,1,1,0,1)) = Ψ(0,0,0) = (1,0)

$r$L = Ψ(Φ(1,0,0,1,0,0,1,1,1)) = Ψ(1,1,1) = (0,1)

To show different effects of generalization three distorted characters, shown in Figure 5.5(b) are used as the test set:

The first character tests generalization within the input modules, the second tests the generalization on the number of correct sub-patterns, and the third character is a combination of both. (The figures in the input vectors are according to the gray-level in the pattern; the outputs are taken from a typical neural network).

1 = Ψ(Φ(0.9,0.2,0.1,0.7,0.2,0.1,0.5,0.5,0.5)) = Ψ(0.95,0.86,0.70) = (0.04,0.96) &thicksp;&thicksp; ⇒&thicksp;&thicksp; 'L' r2 = Ψ(Φ(1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0)) = Ψ(0,0.49,0) = (0.91,0.09) &thicksp;&thicksp; ⇒&thicksp;&thicksp; 'H' r3 = Ψ(Φ(0.9,0.2,0.2,0.9,0.5,0.2,0.9,0.2,0.9)) = Ψ(0.92,0.65,0.09) = (0.15,0.89) &thicksp;&thicksp; ⇒&thicksp;&thicksp; 'L'

Next: The Implementation Up: Analysis of the New Previous: Learning Problems

Albrecht Schmidt
Mit Okt 4 16:45:34 CEST 2000