0.2.6 Boyer-Moore String Searching
The Boyer-Moore searching algorithm, described in R. S. Boyer and
J. S. Moore's 1977 paper A Fast String Searching Algorithm is
among the best ways known for finding a substring in a search space.
Using their method it is possible to search a data space for a known
pattern without having to examine all the characters in the search
space. Boyer-Moore search algorithms are based on two search
heuristics.
The first of these rule tells us how to search for substrings without
repeats in a data space. Keep a pointer into the data space at the
current search location; initialize this pointer to the start of the
space plus n - 1 characters where n is the number of characters in
the target string.
Compare the character in the data space pointed to by this pointer
with the characters in the target string. If this character does not
occur in the target string, advance the pointer by n places.
If the character does occur in the target string, advance the pointer
by n - p places where p is the position that the character in
question first occurs in the target string.
This process repeats until either a match is found or we have shifted
over past the end of the search space.
The second search heuristic applies to searching for targets with
repeating patterns. Using only the rules set forth in the first
heuristic will work for targets with repeating patterns but the search
will not be as efficient as possible. By examining partial matches
and repeats in the target string, though, it is possible to make more
drastic pointer jumps and arrive at the match more rapidly. This type
of jump is based on a table which is computed before the search
begins.
|