Machine Intelligence: Stochastic Diffusion Search

Searching for things is a time consuming problem. It is why, for instance, we file things in an office environment, and keep things in a nice(?) structured order in libraries. Because we impose a structure on the data held, it is easier to find what we need, when we need it.

The problem is, not everything is neatly ordered, and indeed, with some things there are just so many possibilities that ordering them would be infeasible (and, potentially, impossible). So we need to have ways of searching.

Essentially, all machine learning systems are types of search engine, it is just that they are generally searching for optimal solutions which are defined by a set of examples which do not necessarily cover the entire problem space. It is the features of examples which normally give the system the chance to spot where solutions lie – and to do so they effectively make a series of refinements on an initial guess. By working out how much the guess is wrong by, they work by moving towards solutions with lower errors.

But it is not always this way, and it is not always the best way to approach the problem. Some problem spaces are convoluted affairs, with multiple local minima in the error-space. The feature space can be relatively disjoint, with solutions only appearing in certain areas. The only thing which really shows where a solution may be is whether there are features in the problem space which can be associated with a solution.

For instance, if our problem space were to be a nice simple search in nice simple text, we could set up a nice simple string to examine:

ALLFOODSAREGOODANDTASTEYIFYOULOOKHUNGRYENOUGH

I never said it had to make sense. Now, if we look for a particular word, say “LOOK”, then we can obviously search through from the beginning and find that it is there. If we use such a naïve method, we would see an A at position 0 (computer scientists tend to number things from 0, sorry) and so we can skip it.

So let’s look at position 2 – ah ha! An L, which matches our first character. So now we can look at the second one… we are looking for an O and we have an L. Sadness for our search engine.

But, we can carry on looking, and we will, eventually, find LOOK right up there at position 31.

Stochastic Diffusion Search (first described by Dr. Mark Bishop) takes a different slant on things. If you imagine that that string was really really big, starting at the beginning is probably not going to be a good move. So SDS uses a population of ‘agents’ to look at local features within the search space. Because this is a small string, we will just us 3 (which is crazy talk in terms of the real technique, but it will suffice for now)

We will initialise the search with agents at positions 7, 23, and 36.

Each agent starts out inactive, but knows that it is looking for the feature “LOOK”. It picks one element of its search feature at random, and looks that far away from where it is to see if there is a match.

Ours will pick 0, 2 and 1 (picked entirely arbitrarily by a wetware pseudo random number generator called my mind)

So, Agent A looks at string location 7+0, which is a D, and compares it to the 0^th element of the feature array, which is L. No match.

Agent B looks at string location 23+2, which is an F, and compares it to the 2^nd element of the feature array, which is O. No match.

Agent C looks at string location 36+1, which is a G, and compares it with the 1^st element, which is also an O. Still no match.

Now, any agent which has failed to match asks other members of the population for a hint. As none of them have matched, none of them are feeling particularly helpful, and in this case each of them picks a new spot to squat. Let us assume they pick 2, 12 and 28, and that they will have offsets into the search string of 2, 1 and 2 (OK, so that time they were a lot less than arbitrary, but we don’t want to be here all day, typing is not as fast as processing)

So Agent A now checks position 4, which is an O and matches, well, an O.

Agent B checks position 13, which is an O and matches another O

Agent C checks position 30 and also matches an O.

This time, they are all ‘successful’ and all of them become active.

You may be thinking “But two of the are looking in entirely the wrong place, and the third one isn’t much better”, and you would be right. However, they are not done yet. Because they are looking through the space in a fairly random way, the system is not going to take the first partial match they acquire as gospel, and they must carry on until enough of them agree for long enough that the features they have found are the right ones.

So, turn 3… this time, because all of them are active, they don’t jump around like jumping beans, but sit quietly and take a look around. They choose to look at 0,2 and 2 this time.

A : Looking for an L, found an L.

B : Looking for O, found a D – becomes inactive.

C : Looking for an O and found one.

Now this time, B knows it is in the wrong place, and asks one of its comrades at random for a hint. I’ll toss a coin… C. So B now jumps to the same position C is in, to continue its search next time. (Only we know that this is a bit of a dead end, but Shhh!! Don’t tell the agents!). A will look at pos 3, B at 1 and C at 0

A: looking for a K, found an O – becomes inactive

B: looking for an O, found an O

C: looking for an L, but found an O – becomes inactive.

Now the system has a bit of a problem in a silly case like this one, because now all of the agents will jump to B’s location (30). They only have a 1 in four chance of being active, but as long as one of them is, they will stay in place. Whilst they have not actually found the right spot, they are not far off.

If you have many, many agents, there is obviously a better chance that one will land on the right spot, and of course, once there will always be active. If others find good but not perfect spots, such as the G of GOOD, they will have a 50% chance of staying active each turn. It should be fairly obvious that the agents will tend to cluster around the right areas.

And it really does work – it is a surprisingly fast search algorithm which also happens to be naturally very well suited to parallel processing. The more processors you can chuck at this baby, the faster, and more reliably, it will find the solutions.

I suspect (but confess I haven’t checked yet) that there are some possible optimisations to this basic framework – these may have already been described in the literature. The most obvious three, to me are:

Allow the local search to work in both directions from the chosen location. Negative offsets will look backwards through feature space, and a 0 offset will have to check for both the 0^th element and the last element.

When an agent gets its new location from an active agent, allow it to land nearby if there are already a certain number of agents in one spot. If the search space had a lot of repeated elements (say “aaaaaaaaaah”) and the required string also had a similar high frequency of a particular element (“hahahahahahahahahah”), then there is an unpleasantly high chance of getting false matching. Allowing agents to miss by a fraction would, I believe, help reduce the risk of this.

If you have a certain number of agents in one place, remaining active for a reasonable number of turns, check if they are really on the solution. Of course, if an exact match doesn’t appear in the search space you don’t want to lose the close match you have found, as it may be the best you can do, but you probably also want to make sure some agents are still out there exploring (actually, I seem to remember something similar to this in the method… I really should check!)

Anyway, that is a brief introduction, by means of an example which fails to find the solution in the number of iterations I am prepared to type out, of Stochastic Diffusion Search. It is certainly something I am exploring further at the moment.

Sunday, June 04, 2006

Stochastic Diffusion Search

1 Comments:

About Me

Previous Posts

Adverts