This content was published by Andrew Tomazos and written by several hundred members of the former Internet Knowledge Base project.

Genetic Algorithms and Cake Baking

Welcome to the second edition of the IKB newsletter.

Last time we talked about the human brain and cascading computation. It was mainly a primer for a topic called neural networking software (and hardware), which we will get to later.

This time we want to make a small jump over to a fun topic called genetic algorithms.

++ EVOLUTION ++

Science's current best model for how and why animals have changed and adapted over the years is evolution by natural selection.

Here is a super-quick rundown.

When an animal is first conceived it consists of a single cell. Within this cell is contained a unique constant piece of data (its DNA). As the animal grows and the cells split and multiply, the underlying constant is copied to every cell in the body. Many things about the way the animal grows, looks, and behaves are influenced by this DNA recipe.

As this animal goes through its life, it either dies or it survives into maturity. When survivors breed, their two DNA recipes combine into one, and the result is put into the first cell of their offspring. Hence, on average, the good recipes stick around and the bad ones get dumped. (Well, the ones that are good at surviving in the pre-medical-science natural world anyway.)

Animal breeders have been using this trick for millennia to produce animals that look a certain way or have certain traits. The ones they like are allowed to breed, and the ones they don't like aren't bred. Hence, breeders tend to develop animals with traits they like.

Looked at in the abstract, each entity is created from a piece of data. Each entity is then put through numerous tests. The entities that get good results have their pieces of data paired up and reproduced; the ones that don't get discarded. That's the essence of the theory of evolution.

What if we used this idea on something other than animals? Like cake recipes?

++ EVOLUTION IN CAKE BAKING ++

Let's say we are given twenty different flavorings we can use to make a cake. With a recommended amount to use, our choice for each flavor is whether to use it or not. Either yes or no.

Each flavor tastes okay on its own, but when we combine them all the taste is overwhelming. Conversely, when we use none of the flavorings the cake tastes too plain. The best tasting cake will use some combination of them.

There are about one million combinations of twenty flavors. We could make all one million cakes, taste them and then pick the best one. This would give us the perfect cake, but it would take a very long time.

So instead let's imitate nature's way. We start by baking ten cakes using random combinations of the twenty flavors. We taste each of them and throw out the five worst-tasting cakes, leaving five cakes.

We then make every possible pairing of the remaining five cakes. There are ten possible pairs of five items.

For each pair of cakes we combine the recipes to make a new recipe. For the flavorings that appear in both "parent" recipes, we keep in the new "child" recipe. For flavors that appear in neither, we leave out of the child. For a flavor that appears in one parent but not the other we toss a coin to decide if we keep it or not.

We then bake our ten new cake recipes. We now have fifteen cakes.

We taste them and discard the worst ten, leaving five.

We then pair up the five and recombine the recipes to make a new batch of ten. We have fifteen. Discard the worst ten. Back down to five.

We repeat exactly the same process: Pair up the best five to spawn ten new ones, again and again, until we have spent as much time on it as we are prepared to find the best cake recipe.

Will it find the best possible recipe? Most likely not. Will it efficiently find a good one? Most likely yes. (If anyone actually tries this experiment, please let us know how it went. And please use small cakes.)

++ TOO MANY COMBINATIONS ++

If a computer had a simple function to model how much we like the taste of a cake based on its recipe, it could test each of the one million cakes individually in a matter of seconds or minutes.

But let's suppose we had 100 flavorings, rather than 20. How many combinations are there? The answer is counterintuitive because it is *not* five million combinations.

It is actually a million times a million times a million times a million times a million (that was five times).

It's difficult to visualize how many possible combinations of 100 yes-or-no choices there are. If we baked every possible cake picking from 100 flavors, they would take up the same amount of space as our sun.

Needless to say a computer isn't going to be able to check each one; in fact, it doesn't even come close. Only 100 flavors makes it completely impossible to check each one. Now we can see why the Coca-Cola recipe is such a well-kept secret.

How much space does it take to store 100 yes-or-no choices? Just over twelve bytes, which is the same space needed to store twelve letters. Very small. But to pick the right one, like guessing a random password, is virtually impossible.

That is why we need better procedures to do things, better algorithms. One way to do that is to use programs that imitate nature, like in the cake baking example above. They are called genetic algorithms.

++ GENETIC ALGORITHMS ++

For a certain class of problems, genetic algorithms like the cake baking example are really effective. Computers can obviously use populations much bigger than our ten cake example above, and evolve faster through the generations.

The Population is the number of entities in each generation. In our example the population was ten.

The Fitness is a measure of how desirable a given entity is. It is the output of the testing function. In our example it was how good each cake tasted.

The Chromosome is the data used as the seed to create each entity. In our example it was twenty bits (twenty yes-or-no choices) representing which of the twenty flavors we combined to make a given cake.

The Species tracks the ancestry of a given entity. Which generation it is from, who its parents were, who its grandparents were, etc. Sometimes it is useful to use different populations and come up with races of entities that don't interbreed, or interbreed less frequently.

There are endless variations on this theme being tried out in computer labs around the world.

The major question is, what function do you plug the recipe into to create the entity? And what function do you use to test its fitness? Also, what technique do you use to combine two recipes together? These things have to be tightly coupled in order for the process to work.

In other words, how good of a cake we make is always limited by the technique we use to make the cake with the ingredients selected, and our sense of taste. In this sense, genetic algorithms are just a tool, an idea, one piece of a subset of all puzzles.

Genetic algorithms are an example of using an idea taken from nature to build man-made machines. Most people agree we have a lot to learn from nature, even in seemingly unrelated things like the computing industry.

++ APPLICATIONS ++

Genetic algorithms have found application in a variety of areas. They've been used to evolve, rather than design computer programs in a language called Lisp. They've been used to evolve neural networks for use in pattern recognition and gaming AI. They have even been used in economics and finance to solve a class of problem called the traveling salesperson problem - where we want to find the shortest route to visit many destinations.

In summary, they are a subset of the general area called search, where we have a large number of potential solutions, but there are too many to test each one to find the best one.

++ ADDITIONAL DETAILS ++

A minor question might be, is there a place for mutation in this scheme? Evolution in the natural world has all the time in the world, and no explicit goals. Is that model too expensive for practical projects?

In addition to the recombination of existing bits of DNA, another source of change is the mutation of genes within the current DNA pool. Mutations can introduce changes in structure/behavior that could not come about by recombination. Although many mutations are deleterious--even fatal to the organism that carries them--it has been estimated that each of us carries about 300 mutations of one kind or another. When a beneficial mutation shows up, it increases the survival chances of its carrier--and thus of itself being passed down to future generations.

One interesting point about DNA is that there are genes that control the timing of the expression of other genes, so that even a single mutation in a "controller gene" can have a much larger effect on the final organism than might be expected.

Back to Index