next up previous contents
Next: Approaches to Neural computing Up: Introduction Previous: Memorization and Generalization

Acquiring Knowledge by Learning

A vital property of neural networks is that they can learn the desired response from a set of examples in the domain. This contrasts with most other approaches in computing where an algorithm or rules are used to store the knowledge.

The advantage of learning from examples is that there is no need to explicitly form a rule system for the task. To extract rules from the knowledge in the domain implies that there is some expert interpretation. This process is often difficult, especially if the experts have different opinions on the problem. From an abstract point of view training a NN can be seen as an automatic process of extracting rules from a data set.

There are two basic paradigms of learning, supervised and unsupervised, both of which have their models in biology.

Supervised learning at its most general is a process where both information about the environment (e.g. the sensory stimuli) and the desired reaction of the system (e.g. the motor response) is given. It is analogous to human learning with a teacher who knows all the answers.

In an ANN context supervised learning is a process of memorizing vector pairs. The input vector together with the desired output vector is known. This method is often referred to as memorizing a mapping from the input space to the output space.

A special case is autoassociative mapping where the input pattern and the output pattern are equal (often written as a single vector). Autoassociative memories are often used to retrieve the original pattern for a distorted input.

Many algorithms for supervised learning work on a comparison between the calculated response of the network during training and the target values. There are also learning techniques where the input-output pair is directly used to calculate the weights in the network (e.g. the bidirectional associative memory [simp90, p61,]).

A variant of supervised learning is called reinforcement learning. In this method the required output is not provided; the response by the teacher is only whether the calculated result is `right' or `wrong' [patt96, p28,].

 figure140
Figure 1.2:  Learning and the Problem of Overfitting.

In supervised learning it is often difficult to determine when the learning process should be terminated. A network with a small error (the overall difference between the calculated and desired output) does not necessarily show a good performance on new data from the same domain. The problem is called overfitting. If the training process went on too long the network is biased towards the training set and the generalization ability of the network is decreasedgif. If the process is stopped too early the decision is very rough. In Figure 1.2 this is illustrated for the separation of two sets.

Unsupervised learning works only on the input vectors, and the desired output is not specified. This learning method can be compared to the process of categorization, discovering regularities, or adaptation to specific features.

The following sets explain the basic concept; S is the original set, the task is to split the set into two groups (here S1 and S2):

S={ , ∨, , , ∧, , }

S1 = { , ∧, }

S2 = { ∨, , }

In this case the solution found, is a categorization according to the opening of the symbol. There are certainly other possible groupings, but the chosen one is very obvious.

Considering another set, difficulties with the unsupervised learning appear. If there are different ways to split the set, the process is not straight forward anymore. Assuming S={ a,A,b,B}, the following categories make equal sense: S1 = {a,b} and S2= {A,B} or S1 = {a,A} and S2= {b,B}.

In many unsupervised models the categorization occurs according to the distancegif between the input vectors. An example for a measure used is the Hamming distance for binary vectors. Generalization based on this approach groups input vectors in a way to minimize the distance between the members of a category for all categories.

If using an unsupervised model it is very important to analyze whether the clustering done by the network is the right way of grouping the data for the given problem.


next up previous contents
Next: Approaches to Neural computing Up: Introduction Previous: Memorization and Generalization

Albrecht Schmidt
Mit Okt 4 16:45:34 CEST 2000