There are many other ideas to improve the training of MLPs. The focus is on approaches using a layer by layer learning method.
The algorithm proposed in [leng96] works on optimizing an objective function for the internal representation in each layer separately. The weight updates in the hidden layers are independent from the final output of the network.
Another method based on information theory is described in [bich89]. The performance of a hidden unit is measured by its ability to transmit class information. The training process searches for a set of weights that minimize the conditional class entropy in each layer. In order to apply the concepts of information theory the neural network is viewed as a multistage encoder. To find the minimum the simulated annealing technique is used.
An accelerated learning algorithm for MLPs is given in [ergz95]. Only the neurons in the hidden layers use a sigmoidal transfer function; the neurons in the output layer use a linear function. The algorithm works iteratively and due to the linearization of the activation function the task of finding weights for the hidden layer is reduced to a linear problem. The weights in each layer are updated dependent on each other and on a particular cost function, but independent of the other layers.
All these methods showed superior performance on a number of applications. Most of the improved algorithms are useful for problems with certain properties.