Another 2 inventors Boyer and Moore make improvement over brute force. The spark is: why compare the needle from left to right, how about compare from right to left? This idea leads to a N/M cost algorithm.
Let's see what will happen if we compare from right to left. We align the haystack with needle at the beginning, start from right-most character of needle, we compare from right to left , if a mismatch character is found in haystack.
3 scenarios for sliding right |
There are 3 things we can do:
- the character is not in the needle, we can slide the needle passing the mismatched character, then start from right to left again.
- the character is in the needle, we then slide the needle to align with the right-most matching character in the needle, then start from right to left again.
- if either 1 and 2 slides the needle left, we slide needle right 1 position anyway.
In a typical scenario, most of the characters in the haystack are not in the needle, we slide the needle right-ward M step at a time, the cost is proportional to N/M!
In the worst case, for example, a long EEEDLEEEEDLEEEEDLEEEEDLEEEEDLE...EEEDLE haystack, most of times, we only slide needle one step right, the cost could be proportional to NM.
We need a size R array to store the right-most index of any character in the needle, so the extra space cost is R.
BoyerMoore.java