Evolutionary Computation for Classifier Assessment and Improvement

Object detection systems, in particular face detection, have become a hot topic of research. Applications that employ this kind of systems are becoming widespread. For instance, they can be found in search engines, social networks, incorporated in cameras, or in applications for smart phones. Like in other example-based learning techniques, the datasets employed are vital, not only for attaining competitive performances, but also for correctly assessing the strengths and shortcomings of the classifiers. As such, developing adequate datasets for training, testing and validation becomes a crucial and complex process.
The working hypothesis is that Evolutionary Computation can be used to assess and improve the performance of classifiers by evolving new training instances. To test it, a framework that uses Evolutionary Computation for assessment and improvement of classifier performance is proposed, developed and explored.

The Framework

The framework performs multiple parallel evolutionary runs to generate a large number of potentially misclassified instances. A Supervisor module determines which of the generated instances have been misclassified and which should be added to the training set. A new classifier is trained based on the original training set augmented by the selected evolved instances, thus potentially overcoming some of the shortcomings of the original classifier.

Figure 1

Overview of the system

The application of this approach involves the following steps:

  1. Define training set with positive and negative examples;
  2. Train face detectorN EC runs are started, with different random seeds;
  3. The face detector classifies the generated individuals;
  4. Fitness depends on the internal values of the classification task;
  5. Each EC run stops when a termination criterion is met
  6. The individuals generated throughout all EC runs go through a supervisor module that selects and filters instances.
  7. The resulting images are added to the training set;
  8. Repeat from step 2 until the boosting criterion is met;


The Evolutionary Computation Engine


Figure 2

Examples of Images generated by the evolutionary engine using interactive evolution

The EC engine, NEvAr used in this experiments is inspired by the works of Karl Sims. It is a general purpose, expression-based, GP image generation engine that allows the evolution of populations of images.

The Classifiers

For face detection we use Haar Cascade Classifiers. They are trained using the “opencv\_haartraining” tool of OpenCV. The quality of the positive and negative datasets used in training significantly influences the performance of a classifier. We use two well-known datasets were used: “The Yale Face Database B” and “BioID Face Database”

Figure 3

Haar features application


Experimental results

We performed 30 independent EC runs, with a length of 100 generations each and a population size of 50. After the EC runs were completed, the images evolved through the 30 runs that were classified as faces by the initial face detector were gathered and submitted to selection and filtering. This proceeds as follows:

  1. If the supervisor module also recognizes the image as a face it will be discarded; Thus we are only interested in images for which there is a disagreement between the face detector used to guide the run and the supervisor — the reasoning is that this disagreement tends to indicate the presence of a false positive;
  2. During filtering, if the image is similar to a previous one it is discarded;

The remaining set of images is added to the negative dataset and the classifier was retrained.
To assess the performance of the classifiers in test data we considered three independent datasets:

  • Flickr – 2166 negative images;
  • Feret – 902 positive images from Facial Recognition Technology Database;
  • CMU-MIT – 130 positive and negative images.

The following table summarizes the results:

Figure 4

Table of Experimental Results

The first row presents the performance of the initial classifier in terms of Hits, Misses, False Alarms and Percentage of correctly classified instances. FDLib, describes the performance of the classifier used as supervisor. IEC Average presents the results obtained by adding images generated in individual evolutionary runs to the negative training set. Manual refers to the results obtained by hand-picking which images should be added. The last four rows show the results obtained by enriching the training dataset with images from different evolutionary runs. The four combinations result from using or not a supervisor module and by filtering or not the images.

Conclusions and Ongoing Work

The approach brings significant performance improvements on the considered experimental conditions. Future work will focus on:

  • Improving the supervisor module
  • Generalization of the results
  • Several boosting iterations


In Proceedings

  • J. Correia, P. Machado, and J. Romero, “Improving haar cascade classifiers through the synthesis of new training examples,” in Genetic and Evolutionary Computation Conference, GECCO ’12, Philadelphia, PA, USA, July 7-11, 2012, Companion Material Proceedings, 2012, pp. 1479-1480.

  • P. Machado, J. Correia, and J. Romero, “Expression-Based Evolution of Faces,” in Evolutionary and Biologically Inspired Music, Sound, Art and Design – First International Conference, EvoMUSART 2012, Málaga, Spain, April 11-13, 2012. Proceedings, 2012, pp. 187-198.

  • P. Machado, J. Correia, and J. Romero, “Improving Face Detection,” in Genetic Programming – 15th European Conference, EuroGP 2012, Málaga, Spain, April 11-13, 2012. Proceedings, 2012, pp. 73-84.


João Nuno Correia

Penousal Machado

Juan Romero