Page 1 of 4
Genetic algorithms are easy to understand and easy to implement. In this short tutorial we look at the theory and create a program to solve the "blind Knapsack" problem. And we consider a favourite xkcd cartoon and the terrible warning it contains...
More cartoon fun at xkcd a webcomic of romance,sarcasm, math, and language
Many problems in AI can be cast in the form of optimization - find the best move, the best answer, the shortest route etc. Artificial evolution, or the genetic algorithm as it is usually called, is yet another optimization method.
In the real world the genetic algorithm works on populations of animals selecting the best and then breeding even better examples. It is an optimisation process stretching over many millions of years. Each species is being optimised for the particular environmental niche that it finds itself occupying.
Given that the genetic algorithm seems to work well in such a variety of situations it is worth asking why?
It also seems to work very rapidly.
Yes I know evolution takes millions of years but when you take into account how far it has come in that time the effort seems a reasonable expense. If we could find a way of harnessing this power of optimization and development then perhaps deep research into AI isn't necessary.
We can create the start of our AI product, place it in its operating environment and then subject it to some form of accelerated evolution. If you are familiar with AI you will recognise that this idea is very similar to the concept of building a brain by simulating neural nets and waiting for it to learn. It is indeed the same idea but taken one stage further back - now I expect the neural nets to evolve rather than be constructed!
For a more basic introduction to the genetic algorithm see Genetic Algorithms. This account is more a technical overview.
As you might expect the basic raw material of the genetic algorithm is the gene. This is a sequence of symbols which determines the some aspect of the system that we are trying to optimise. A Chromosome is a collection of such symbols and given sufficient chromosomes every aspect of the system is determined.
Basically the first thing we need is an artificial gene and artificial genetics.
For our simple artificial genetics we can concentrate on the manipulation of a single string of symbols.
The genetic algorithm starts from a breeding population of systems each determined by its genetic string.
For evolution to work on this breeding population there must be some way of evaluating each system. In real life this is just the fitness of the organism for its environment. In artificial genetics you need some sort of evaluation or fitness function, f(x), that can be applied to each system which returns a result proportional to its suitability.
This is of course the function that we are going to try to optimise by running the genetic algorithm. It is also the subject of the cartoon at the start of this article and it will become clear why you have to include the indicated clause in the fitness function.
To produce new members of the population for the artificial selection to work on we need to mimic breeding.
There are three basic genetic operations:
Starting from an initial population of symbol strings new generations are produced by the operation of reproduction, crossover and mutation.
Each system gets a chance to reproduce itself with a probability proportional to its fitness.
That is at each breeding step each string x enters a number of copies of itself into the breeding pool given by random selection from the population with a probability proportional to f(x).
Strings are selected at random in this way until the breeding pool has as many strings as the original population - in this way the population maintains its overall size.
Once in the breeding pool the systems are paired off at random. Then for each pair a crossover operation is performed. A random position is chosen in the string and both strings are cut at this point. Then the heads of each string are swapped. For example, if the population consists of six letter strings from the alphabet a crossover operation on the mated pair
ABCDEF and GHIJKL
at position 4 would produce the two new strings
ABCDKL and GHIJEF
You should be able to see that the crossover operation corresponds to the exchange of genetic information.
In human reproduction for example genes occur in pairs and sexual reproduction involves taking half of the new pair from each partner.
The final genetic operation is mutation. This is simply the random changing of any symbol within the string. The purpose of mutation is to introduce a random kick to the genetic system every so often just to make sure that it isn't going up a blind alley.
The probability of mutation for an effective genetic algorithm is very small. In this sense mutation is a fine adjustment to the overall process.
That's all there is to it. Get yourself together a population of symbol strings and breed new generations. As time goes on the fitness of your population should increase to an optimum value and the individuals should get better at what ever the task is that is being measured by the fitness function.
Why does evolution work?
To answer the question of why and under what conditions the genetic algorithm works you have to consider in more detail the effects of the crossover operator.
There are bound to be patterns of symbols in the strings that effect fitness. A good coding of the behaviour of the system should result in groups of symbols effecting some aspect of the system that makes it better or worse at the task. In other words groups of symbols should function together as genes or schema in the jargon of the genetic algorithm .
What does all this mean?
Well the operation of the genetic algorithm tends to select the most fit symbol strings and these strings pass tend to pass on low order short defining length schema. As long as there are compact schema that are associated with good performance the genetic algorithm will work on the population to improve it generation by generation. Of course if this isn't the case then the genetic algorithm will be less than efficient.
This also explains the nature of the coding problem hinted at earlier. If at all possible you should use your understanding of a problem to code it so that short, low-order schema control important features and the schema should be as independent of one another as possible. This sounds like a serious difficulty with using the genetic algorithm because if you understand the problem that well then you will probably be able to use it to solve the problem in a more direct way than mucking about with genetics!
The good news is that ways of including the coding problem are being developed as part of the genetic algorithm. In other words soon we may be able to allow the coding to evolve as well as the population of strings.