Genetic Algorithms; PRIII 2017-18

Maths projects → Project III topics

Project III 2017-2018

Genetic Algorithms

Description

There is inceasing interest in so-called "big-data" problems, which can be summarised as how to find solutions in search-spaces that are too vast to search directly. Various heuristic techniques have been developed for tackling such problems. This project begins by getting to grips with one of the simplest, known as a genetic algorithm, or GA. As an example, suppose one wanted to find the highest point of a complicated function with many maxima. Taking the approach of “walking up the slope” can leave one trapped at a local maximum that is not the highest one. On the other hand, if the function has many-variables (say for example it is 10 dimensional), it may be too costly in terms of computing time to work out the height at each point and simply scan the whole volume.

To solve the problem using a genetic algorithm, one mimics natural selection by metaphorically covering the curve with a population of “creatures” whose DNA (referred to as the genotype) is constructed from their coordinates: that is one takes the coordinates, and literally writes them out as a string of data, perhaps in binary as a series of ones and zeroes. Every creature’s genotype gives rise to physical characteristics (phenotype), which in this example is the height of the surface at that point. The creatures are then allowed to breed over several generations with a frequency related to their “goodness of fit” (in this example how high they are). Breeding is done by splicing the DNA: literally mixing the strings of data. Unfit creatures die without breeding, and random mutations are introduced in the data in order to prevent the population getting stuck at a local maximum. Gradually, over a few generations, the unfit members of the population die out and only the fittest survive, namely those that have evolved to the desired solution. An example is shown below (written in Mathematica). A population of only 20 breeding individuals can find the correct global maximum of this function (order one parameters constructed from polynomials and sines), to about 0.01 accuracy over 20 generations.

That example is simple to understand but doesn’t show the real power of genetic algorithms. So here is another one (taken from Mitchell - below). Suppose you want to search for a protein—a sequence of amino acids—that folds up to a particular three−dimensional shape so that it can be used, say, to fight a specific virus. The search space is the collection of all possible protein sequences—an infinite set of possibilities. To constrain it, let us restrict the search to all possible sequences of length 100 or less. But it is still a huge search space, since there are 20 possible amino acids at each position in the sequence. If we represent the 20 amino acids by letters of the alphabet, candidate solutions will look like: A G G M C G B L… and there are of order 20^100 possible solutions!! Given that the goodness of fit will involve some modelling of the physical properties of the protein, it is unlikely that any systematic search could find the right solution, but heuristic techniques can.

These techniques have been successfully applied to many similar problems, including financial forecasting (where they are able to “beat the market” and supposedly outperform neural networks), particle physics, and so forth. In this project, which is open ended, you will identify an interesting optimization problem, and attempt to develop the genetic algorithm code to solve it computationally. The precise optimization problem(s) you choose to explore is unspecified, and may involve any area, from for example finding the solution to a variational principle problem in mathematical physics, to predicting exchange rates, or developing a routefinder. You would also have the freedom to explore other heuristic techniques such as machine learning. The project will require significant computing ability, one of Python, Fortran, Java, C or C++ .

Prerequisites

Experience with programming in C, C++, Fortran, Python or Mathematica.

Resources and references

There are a large number of papers and introductory books;

Genetic algorithms in financial forecasting http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.9698&rep=rep1&type=pdf

Goldberg is one of the better ones apparently http://books.google.co.uk/books/about/Genetic_Algorithms.html?id=6gzS07Sv9hoC&redir_esc=y

Mitchell is a great introduction: http://library.dur.ac.uk/search~S1?/Ymitchell+melanie&searchscope=1&SORT=D/Ymitchell+melanie&searchscope=1&SORT=D&SUBKEY=mitchell+melanie/1%2C4%2C4%2CE/frameset&FF=Ymitchell+melanie&searchscope=1&SORT=D&1%2C1%2C