Towards the Evolution of Multi-Layered Neural Networks: A Dynamic Structured Grammatical Evolution Approach

Current grammar-based NeuroEvolution approaches have several shortcomings. On the one hand, they do not allow the generation of Articial Neural Networks (ANNs) composed of more than one hidden-layer. On the other, there is no way to evolve networks with more than one output neuron. To properly evolve ANNs with more than one hidden-layer and multiple output nodes there is the need to know the number of neurons available in previous layers. In this paper we introduce Dynamic Structured Grammatical Evolution (DSGE): a new genotypic representation that overcomes the aforementioned limitations. By enabling the creation of dynamic rules that specify the connection possibilities of each neuron, the methodology enables the evolution of multi-layered ANNs with more than one output neuron. Results in di erent classfication problems show that DSGE evolves e ective single and multi-layered ANNs, with a varying number of output neurons.

 

Dynamic Structured Grammatical Evolution (DSGE)

 

With the proposed methodology the gain is twofold: (i) all the genotype is used, i.e., while in Grammatical Evolution (GE) and Structured Grammatical Evolution (SGE) the genotype encodes the largest allowed sequence, in DSGE the genotype grows as needed; and (ii) there is no need to pre-process the grammar in order to compute the maximum tree-sizes of each non-terminal symbol, so that intermediate grammar derivation rules are created. In the next sections we describe in detail the components that make these gains possible.

 

Each candidate solution encodes an ordered sequence of the deriva- tion steps of the used grammar that are needed to generate a speci c solution for the problem at hand. The representation is similar to the one used in SGE, with one main di erence: instead of computing and generating the maximum number of derivations for each of the grammar’s non-terminal symbols, a variable length representation is used, where just the number of needed derivations are encoded. Consequently, there is no need to create intermediate symbols to deal with recursive rules. To limit the genotype size, a maximum depth value is de ned for each non-terminal symbol. Allowing different limits for each non-terminal symbol provides an intuitive and flexible way of constraining the search space by limiting, for instance, the maximum number of hidden-layers, neurons or connections.

 

Evolution of Multi-Layered ANNs

 

Three major drawbacks can be pointed to previous grammatical formulations that are used to encode ANNs: (i) they only allow the generation of networks with one hidden-layer; (ii) the output neuron is always connected to all neurons in the hidden-layer; and (iii) there is no way to define multiple output nodes, at least one that reuses the hidden-nodes, instead of creating new ones.

 

Fitness
Figure 1

Grammar used for evolving multi-layered ANNs. n represents the number of features of the problem; – and — are used as neuron and layer separators, respectively.

 

These drawbacks are related to limitations of the evolutionary engines that are overcome by the approach presented in this paper. Figure 1 represents a grammar capable of representing multi-layered ANNs, with a variable number of neurons in each layer. However, there is no way of knowing how many neurons are in the previous layer, so that the established connections are valid.

 

To know the neurons available in each hidden-layer we create new production rules, in run-time. More precisely, considering the grammar of Figure 1, for each i-th <layer> non-terminal symbol we create a <features-i> production rule, encoding the features that layer can use as input.

 

For the rst hidden-layer, has the available features as expansion possibilities, i.e., the ones that are initially defined in the grammar (x1, . . . , xn, where n represents the number of features). Then, for the next hidden-layers we let the connections be established to all the neurons in previous layers except input neurons. For example, using the grammar of Figure 1, when generating an ANN with two hidden-layers, with 4 and 3 nodes, respectively, the following production rules are created and added to the grammar:

<features-1> ::= x1 | … | xn
<features-2> ::= h1,1 | h1,2 | h1,3 | h1,4
<features-3> ::= h2,1 | h2,2 | h2,3

where x represents input features, n the total number of input features, and hi,j, j the output of the j-th neuron of the i-th layer.

 

Fitness
Figure 2

DSGE evolution of multi-layered ANNs. + and ∼ symbols represent the result of statistical tests, where ~ means no statistical significant and +, ++, +++ represent small, medium and large effect sizes.

 

Results

 

The experimental results are presented in the table of Figure 2. For each metric, the first two rows present the averages of the tests conducted with grammars for encoding networks with just one hidden-layer (1 H-L) or that allow the generation of networks with multiple hidden-layers (≥ 1 H-Ls). Additionally, we analyse the ANNs that have more than one hidden-layer (last row, > 1 H-Ls), discarding the evolutionary runs that resulted in networks with just one hidden-layer.

 

The experimental results show that by using DSGE it is possible to evolve effective multi-layered ANNs. Focusing on the comparison between one-hidden-layered ANNs and those that have more than one hidden-layer (> 1 H-Ls) a small, but perceptive difference exists. Almost no improvement is observable in the flame dataset, which is expected as the problem only has two features, and thus an ANN with one hidden-layer already performs close to optimal. In a nutshell, although there are no statistical differences between the performance of 1 H-L and ≥ 1 H-Ls runs, when the evolutionary process results in multi-layered ANNs the differences begin to emerge. This result shows that DSGE is able to cope with the higher dimensionality of the search space associated with the evolution of multi-layered topologies and indicates that in complex datasets, where there is a clear advantage in using deeper topologies, DSGE is likely to outperform other grammar-based approaches in train and test.

 

Preliminary experiments concerning the evolution of ANNs with more than one output neuron have also been conducted, i.e., instead of using just one output neuron and the sigmoid function as activation function we used two output nodes (one per class) with the so max activation function. Obtained performances are similar.

 

In Proceedings

  • F. Assunção, N. Lourenço, P. Machado, and B. Ribeiro, “Towards the Evolution of Multi-Layered Neural Networks: A Dynamic Structured Grammatical Evolution Approach,” in Proceedings of the 19th annual conference on Genetic and evolutionary computation, 2017.

Author

Filipe Assunção

Nuno Lourenço

Bernardete Ribeiro