TGPGAN: An Evolutionary approach to Generative Adversarial Networks

2014 saw the introduction of Generative Adversarial Networks (GANs), proposed by Ian Goodfellow et al. GANs are a type of Generative Model consisting of two main components: a generator and a discriminator, which are mutually trained by optimizing opposing metrics within a zerosum game. Conventionally, these adversarial models use Deep Convolutional Neural Networks that are better fitted for GPU computing and thus more efficient. Before the advent of GANs, Evolutionary Computation (EC) approaches made up the majority of the state of the art for image generation.
However, recent developments in accelerating domain evaluation in expressionbased evolution using GPUs seem to make the case for reconsidering EC as a viable option to improve upon current GAN models without the excessive computational burden. Namely, this project focus on improving image generation GANs through Evolutionary Machine Learning (EML), an emerging field that attempts to combine the flexibility provided by EC with the discriminative power of Machine Learning.
The Framework
Instead of using a standard convolutional neural network for the generator component of a GAN, the model in this work employs a Genetic Programming (GP) run to evolve a set of symbolic expressions. This is where the flexibility of EC comes into play: each symbolic expression encodes the value for each pixel of an image, allowing for the generation of images at an arbitrary level of resolution. Moreover, despite requiring the evaluation of each individual for all fitness cases, the expression to evaluate remains constant, making GP highly parallelizable.
To fulfil this approach, the TGPGAN framework was created using TensorGP for the generator module. In TGPGAN, the discriminator component remains a Convolutional Neural Network (CNN), which is trained by standard backpropagation using both real and generated images. For the current iteration of the model, each training step can be defined by the block of pseudocode below.
The algorithm can be explained as follows. To generate the batch of fake images, a GP run consisting of n generations is performed using TensorGP. After performing n generations, the resulting individuals are then converted to their respective images phenotypes. In this evolutionary process, candidate solutions will be assessed by a forward pass on the discriminator network. For the first training step, the initial GP population is randomly generated while in subsequent steps, a given portion of the bestfitted individuals is taken from the last population. Similarly to the standard generational elitism, this mechanism enables us to implement a certain degree of elitism for the generator across training steps, a kind of “metaelitism”.
Next, another batch of images is retrieved from the original dataset. After this, the weights of the discriminator network are updated by passing in both the real and generated data samples for standard backpropagation training coupled with the Adam Optimizer. The described algorithm is then repeated for each training epoch.
Unlike standard GANs, the approach here described does not aggregate generated solutions into an organized vector space, commonly referred to as a latent space. This is a desired feature in generative modelling as it allows for the addition, subtraction and interpolation of different solutions. As a first step in addressing this shortcoming, an archive of solutions was added to TGPGAN in order to collect the bestfitted individuals found throughout the evolutionary process. The archive is updated on each training step where only the fittest individuals amongst the generated ones and the ones already in the archive are chosen. This ensures that individuals with lower fitness values are kicked out if better ones are produced by the generator.
Preliminary Experimentation
Our first set of experiments consisted of training TGPGAN on the MNIST dataset and comparing generated artifacts with a standard Deep Convolutional GAN (DCGAN) approach after 5 training epochs.
The results for the bestfitted populations after training are shown in Figure 1. As demonstrated, the batches clearly start resembling digits after only 5 training epochs. It is worth noting that the individuals manage to mimic the handwritten style of the digits present on the MNIST dataset by mixing various GP operators instead of producing simple geometric shapes.
Overall, the TGPGAN model achieves better digits with better quality, which can be demonstrated by the more complex digits such as 4’s, 5’s and 9’s. Also, note a slightly more diverse array of solutions in TGPGAN with the strike on some 7’s and the feet on one of the 1’s.
The quality of generated solutions can be quantified by feeding the output into an external classifier. Table 1 shows the percentage of correct classifications over the bestfitted populations on both models. The classifier used for this task is a CNN pretrained on MNIST digits, reaching a test accuracy of 99.25%.
Interestingly, some digits such as 3’s and 7’s seem to be easier to generate (or at least to be classified as such), while others such as 8’s and 9’s proved significantly more challenging in both models. As shown, the average percentage of correct classifications across all digits for TGPGAN is 57% versus 47% for the DCGAN model.
References