Maths projects → Project III topics
Project III 2017-2018
Genetic Algorithms
Description
There is inceasing interest in so-called "big-data" problems, which can be summarised as how to find solutions in
search-spaces that are too vast to search directly. Various heuristic techniques have been developed for tackling such problems.
This project begins by getting to grips with one of the simplest, known as a genetic algorithm, or GA.
As an example, suppose one wanted to find the highest point of a complicated function with many maxima.
Taking the approach of “walking up the slope” can leave one trapped at a local
maximum that is not the highest one. On the other hand, if the function has
many-variables (say for example it is 10 dimensional), it may be too costly in terms of computing time to work out
the height at each point and simply scan the whole volume.
To solve the problem using a genetic algorithm, one mimics
natural selection by metaphorically covering the curve with a population of
“creatures” whose DNA (referred to as the genotype) is constructed from
their coordinates: that is one takes the coordinates, and literally writes them
out as a string of data, perhaps in binary as a series of ones and zeroes. Every creature’s
genotype gives rise to physical characteristics (phenotype), which in this
example is the height of the surface at that point. The creatures are then
allowed to breed over several generations with a frequency related to their
“goodness of fit” (in this example how high they are). Breeding is done by
splicing the DNA: literally mixing the strings of data. Unfit creatures die
without breeding, and random mutations are introduced in the data in order to
prevent the population getting stuck at a local maximum. Gradually, over a few
generations, the unfit members of the population die out and only the fittest
survive, namely those that have evolved to the desired solution. An example is
shown below (written in Mathematica). A population of only 20 breeding
individuals can find the correct global maximum of this function (order one
parameters constructed from polynomials and sines), to about 0.01 accuracy over
20 generations.
That example
is simple to understand but doesn’t show the real power of genetic algorithms.
So here is another one (taken from Mitchell - below). Suppose you want to
search for a protein—a sequence of amino acids—that folds up to a particular
three−dimensional shape so that it can be used, say, to fight a specific virus.
The search space is the collection of all possible protein sequences—an
infinite set of possibilities. To constrain it, let us restrict the search to
all possible sequences of length 100 or less. But it is still a huge search
space, since there are 20 possible amino acids at each position in the
sequence. If we represent the 20 amino acids by letters of the alphabet,
candidate solutions will look like: A G G M C G B L… and there are of order
20^100 possible solutions!! Given that the goodness of fit will involve some
modelling of the physical properties of the protein, it is unlikely that any
systematic search could find the right solution, but heuristic techniques can.
These
techniques have been successfully applied to many similar problems, including
financial forecasting (where they are able to “beat the market” and supposedly
outperform neural networks), particle physics, and so forth. In this project,
which is open ended, you will identify an interesting optimization problem, and
attempt to develop the genetic algorithm code to solve it computationally. The
precise optimization problem(s) you choose to explore is unspecified, and may
involve any area, from for example finding the solution to a variational
principle problem in mathematical physics, to predicting exchange rates, or developing
a routefinder. You would also have the freedom to explore other heuristic techniques such
as machine learning. The project will
require significant computing ability, one of Python, Fortran, Java, C or C++ .
Prerequisites
Experience
with programming in C, C++, Fortran, Python or Mathematica.
Resources and references
There are a
large number of papers and introductory books;
Genetic
algorithms in financial forecasting http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.9698&rep=rep1&type=pdf
Goldberg is
one of the better ones apparently http://books.google.co.uk/books/about/Genetic_Algorithms.html?id=6gzS07Sv9hoC&redir_esc=y
Mitchell is a
great introduction:
http://library.dur.ac.uk/search~S1?/Ymitchell+melanie&searchscope=1&SORT=D/Ymitchell+melanie&searchscope=1&SORT=D&SUBKEY=mitchell+melanie/1%2C4%2C4%2CE/frameset&FF=Ymitchell+melanie&searchscope=1&SORT=D&1%2C1%2C