A vital property of neural networks is that they can learn the desired response from a set of examples in the domain. This contrasts with most other approaches in computing where an algorithm or rules are used to store the knowledge.
The advantage of learning from examples is that there is no need to explicitly form a rule system for the task. To extract rules from the knowledge in the domain implies that there is some expert interpretation. This process is often difficult, especially if the experts have different opinions on the problem. From an abstract point of view training a NN can be seen as an automatic process of extracting rules from a data set.
There are two basic paradigms of learning, supervised and unsupervised, both of which have their models in biology.
Supervised learning at its most general is a process where both information about the environment (e.g. the sensory stimuli) and the desired reaction of the system (e.g. the motor response) is given. It is analogous to human learning with a teacher who knows all the answers.
In an ANN context supervised learning is a process of memorizing vector pairs. The input vector together with the desired output vector is known. This method is often referred to as memorizing a mapping from the input space to the output space.
A special case is autoassociative mapping where the input pattern and the output pattern are equal (often written as a single vector). Autoassociative memories are often used to retrieve the original pattern for a distorted input.
Many algorithms for supervised learning work on a comparison between the calculated response of the network during training and the target values. There are also learning techniques where the input-output pair is directly used to calculate the weights in the network (e.g. the bidirectional associative memory [simp90, p61,]).
A variant of supervised learning is called reinforcement learning. In this method the required output is not provided; the response by the teacher is only whether the calculated result is `right' or `wrong' [patt96, p28,].
Figure 1.2: Learning and the Problem of Overfitting.
In supervised learning it is often difficult to determine when the learning
process should be terminated. A network with a
small error (the overall difference between the calculated and desired
output) does not necessarily show a good performance on new data from the
same domain. The problem is called overfitting. If the training
process went on too long the network is biased towards the training set and
the generalization ability of the network is decreased. If the process is
stopped too early the decision is very rough. In Figure 1.2
this is illustrated for the separation of two sets.
Unsupervised learning works only on the input vectors, and the desired output is not specified. This learning method can be compared to the process of categorization, discovering regularities, or adaptation to specific features.
The following sets explain the basic concept; is the original set, the task is to split the set into two groups (here and ):
In this case the solution found, is a categorization according to the opening of the symbol. There are certainly other possible groupings, but the chosen one is very obvious.Considering another set, difficulties with the unsupervised learning appear. If there are different ways to split the set, the process is not straight forward anymore. Assuming , the following categories make equal sense: and or and .
In many unsupervised models the categorization occurs according to the
distance
between the input vectors. An example for a measure used is the
Hamming distance for binary vectors.
Generalization based on this approach groups input vectors in a way
to minimize the distance between the members of a category for all
categories.
If using an unsupervised model it is very important to analyze whether the clustering done by the network is the right way of grouping the data for the given problem.