A very common method in MNNs is to construct an architecture that supports a division of the complex task into simpler tasks.

Basit Hussain and M. R. Kabuka present a new architecture in ``A Novel Feature Recognition Neural Network and its Application to Character Recognition'' [huss94]. They implemented the idea of decomposing the task of recognition. A complex object (a character) can be detected by recognizing its subpatterns.

The network is based on a two level detection scheme. In the first level sub-patterns are recognized. Multiple detectors are used for each segment to recognize shifted sub-patterns. The second level detects the class according to the information provided by the first level.

The training algorithm which is calculating the weights directly from the input patterns is very simple. The training is therefore very fast.

The tests showed that the network is capable of recognizing distorted, noisy, scaled, and shifted patterns. The complexity of the proposed architecture is much less than in a comparable Neocognitron.

A Divide-and-Conquer Methodology is proposed by Chiang and Fu [chia94]. This works on an architecture consisting of two general components; a Divide-and-Conquer Engine built out of subnetworks capable of managing a subset of the data, and an Integration Engine, that determines which of the subnetwork outputs will be used as the final output of the system.

The training set is divided according to the training error value. Each subnetwork learns its partition of the training set. This process is repeated until the modules in the Divide-and-Conquer Engine can learn the subset with a sufficiently small overall error. The Integration Engine is trained to determine to which partition an input vector belongs.

The article ``A Parallel and Modular Multi-Sieving Neural Network Architecture for Constructive Learning'' by Lu et al. introduces an approach to modular neural networks, which segments the training data automatically and then employes different modules to handle different subsets [lu95].

**Figure 4.6:** The Basic Idea of a Sieving Algorithm.

The basic idea of this network is based on a multi-sieving learning
algorithm, depicted in Figure 4.6.
The patterns are classified by this algorithm on different
levels. In the first level - *a very rough sieve* - some patterns may
be recognized correctly while others will not. The correctly classified
samples are taken out of the training set. The next level - *a less
rough sieve* - is only trained on the remaining ones. After the training
of this level the correctly recognized patterns are removed from the training
set. The remaining patterns form the training set for the next level. This
process is repeated untill all patterns are classified correctly.

Each level of learning - *each sieve* - generates a neural network with
the ability to recognize a subset of the original training set. These
networks, called *sieving modules*, face a simpler recognition
task than the whole problem.

The modules are considered in a hierarchical order. The output of the first module classifying the pattern will be taken as as the response of the system; only if the current module can not classify the pattern the network on the next lower level will be asked. To determine whether the output for an unknown input is accepted as classification or not a 1-out-of-N coding is used.

All modules work in parallel; therefore the speed of the system is nearly independent of the number of modules involved in the recognition process.

The authors state that the retraining of the system with a slightly modified training set is very easy.

One of the main advantages of this system is that the decomposition of the problem and the generation of an appropriate network is made by the algorithm and not by the user.

Furthermore the idea of sieving seems very intuitive to human understanding and thinking. The concept of a self growing and massive parallel system is very promising.

The problem of deciding whether the classification of an unknown input pattern on a certain level is correct is not easy. The 1-out-of-N coding is a way to decrease the probability that outputs are classified as valid by chance; the algorithm will have to show if this is sufficient for real world data sets.

The idea of reducing the input dimension on single modules within a network is introduced in [kim94]. The system for a $N$-dimensional input vector consists of $N$ competitive modules. Each of the modules is connected only to ($N-1$) of the inputs; and each input is connected to ($N-1$) modules. The competitive learning in each module is now based on slightly different ($N-1$)-dimensional input vectors. In a second stage a recognition layer decides according to the outputs of all submodules.

It is shown that this parallel architecture is superior to a single $N$-dimensional competitive network.

Mit Okt 4 16:45:34 CEST 2000