next up previous contents
Next: Prediction on Heart Disease Up: Problems with a Small Previous: Problems with a Small

Prediction on Diabetes Data

  This data set is called the `Pima Indians Diabetes Database' [sigi96]. The original set has 768 instances, each tuple having eight input attributes normalized in the interval [-1,1]. Instances are assigned to one of two classes, 500 of the records belonging to class `1' and 268 to class `2'.

Testing with a single MLP and the eight continuous input variables gave a generalization performance of about 76%, similar to results stated in the informations obtained with the data set.

To allow a comparison between all three types of networks the eight continuous valued attributes were transformed into 80 binary input attributes.

The logical network was not at all useful for this data set. For any configuration (tested RAM-Sizes: 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576) the generalization performance was about 50%. For a two class problem, that is exactly the probability expected by classifying input vectors at random. This can be explained by the large number of data tuples in the training set; the RAM-network was overfilled.

The results achieved with a single multilayer feedforward network were similar to results achieved using the proposed architecture; the best modular neural network M=(80,2,4,small,π,(20,2,[2]),(4,2,[2]) achieved a generalization performance of 79.7%; the best MLP found by manually selecting the parameters had PG=78.3%; the optimal solution found with the program opti was 80.1%.

 figure1022
Figure 7.4:  Generalization and Training Time for the Diabetes Data.

In Figure 7.4 the generalization performance depending on the number of inputs to the modules is shown, together with the time needed to train the networks.


next up previous contents
Next: Prediction on Heart Disease Up: Problems with a Small Previous: Problems with a Small

Albrecht Schmidt
Mit Okt 4 16:45:34 CEST 2000