Site Search:

Boyer-Moore

Back>

Another 2 inventors Boyer and Moore make improvement over brute force. The spark is: why compare the needle from left to right, how about compare from right to left? This idea leads to a N/M cost algorithm.

Let's see what will happen if we compare from right to left. We align the haystack with needle at the beginning, start from right-most character of needle, we compare from right to left , if a mismatch character is found in haystack.

3 scenarios for sliding right
3 scenarios for sliding right

There are 3 things we can do:

  1. the character is not in the needle, we can slide the needle passing the mismatched character, then start from right to left again.
  2. the character is in the needle, we then slide the needle to align with the right-most matching character in the needle, then start from right to left again.
  3. if either 1 and 2 slides the needle left, we slide needle right 1 position anyway.
In a typical scenario, most of the characters in the haystack are not in the needle, we slide the needle right-ward M step at a time, the cost is proportional to N/M!

In the worst case, for example, a long EEEDLEEEEDLEEEEDLEEEEDLEEEEDLE...EEEDLE haystack, most of times, we only slide needle one step right, the cost could be proportional to NM.

We need a size R array to store the right-most index of any character in the needle, so the extra space cost is R.

BoyerMoore.java