For the analysis the following assumptions are made:
Training the new network architecture is faster than training a monolithic modular network on the same problem for three reasons:
Consider a modular network with ten input modules, each with inputs, hidden layer neurons, and outputs. The decision module has inputs, hidden layer neurons, and outputs. This is denoted by: , where .
A monolithic network with the same number of inputs and outputs, and with two hidden layers, each with neurons can be denoted by:
For the number of neurons to be equal in the two networks:
The number of weights in each network is:
If the input is sufficiently large so that:
then
and hence
since monotonously increasing.
Assuming the example from above ( and ) the speed-up is significant. The number of inputs per module is assumed to be larger than eight ().
The number of weights to consider for training in each network is:
The ratio between the numbers of weights to train ():
The number of weights to consider for the time need to train the network is at least 27 times less than in a monolithic MLP.
| Original Set | Set MLP | Set MLP | |||
| 0 0 0 0 0 1 | 0 | 0 0 0 | 0 | 0 0 1 | 0 |
| 0 0 0 0 1 0 | 0 | 0 0 0 | 0 | 0 1 0 | 0 |
| 0 0 0 1 0 0 | 0 | 0 0 0 | 0 | 1 0 0 | 0 |
| 1 1 0 1 0 1 | 1 | 1 1 0 | 1 | 1 0 1 | 1 |
| 1 0 1 1 0 1 | 1 | 1 0 1 | 1 | 1 0 1 | 1 |
| 0 1 1 1 0 1 | 1 | 0 1 1 | 1 | 1 0 1 | 1 |
Class `0' is determined by , , and ; which will be learned very quickly by MLP, which sees this tuple `0 0 0 : 0' three times during one training cycle.
Similarly class `1' is determined by the , , and , which will be quickly learned by MLP.
It is unlikely the that a real world data set has the same structure as the example. However, it can be conceivably produce significant improvements, particularly with large input dimensions.