Sub-String Search
Algorithm | time complexity (worst) | time complexity (typical) | extra space | backup in place | notes |
---|---|---|---|---|---|
brute force | MN | 1.1N | 1 | yes | simple and effective |
K-M-P | 2N | 1.1N | MR | no | other versions trade off worst time performance with space |
Boyer-Moore | MN | N/M | R | yes | can further optimize |
Rabin-Karp | 7N | 7N | 1 | no | Monte Carlo, tiny chance of incorrect |
Regular Expression | MN | ? | M | yes | M is the length of regular expression |
We have studied how to search a key in a collection of keys, for example, searching a string in a collection of strings. In order to search, we need to repeatedly compare 2 keys, for example compare 2 Strings. However, we haven't studied how to search a string in one super long string.
Since string is the carrier of almost all modern informations (DNA for example), this simple operation is so common and so important, that mankind never stops searching for better way of doing it.
If the message we need to search is explicit, the cost is typically 1.1N, a little bit more than reading through the the book of N characters. The best we can do is skippy read and find the answer in N/M. If the message we need to search is a riddle, the typical cost is uncertain, the cost is at most MN, we exhaust all possible meaning of M and check if N can match.
The primitive brute force algorithm is very cost effective in most cases, but we can push the limit with cleverly designed DFAs and NFAs.