0.5.10 Soundex English word-sounding Algorithm
M. K. Odell and R. C. Russell patented the Soundex phonetic comparison
system in 1918 and 1922. Soundex coding takes an English word and
produces a four digit representation of the word designed to match the
phonetic pronunciation of the word. It is normally used for ``fuzzy''
searches where a close match may be desired. For example, to come up
with alternative possibilities for a misspelled word some spelling
checker programs generate a Soundex code for the misspelled word and
then suggest other words with the same Soundex value. Additionally
Soundex codes are often used on surnames which are difficult to spell.
The creation of a Soundex code is a pretty simple operation. The
first step is to remove all non-English letters or symbols. In the
case of accented vowels, simply remove the accents. Any hyphens,
spaces, etc... also. In addition, remove all H's and W's unless they
are the initial letter in the word. Next, take the first letter in
the word and make it the first letter of the Soundex code. For each
remaining letter in the word, translate it to a number with the table
below and concatenate the numbers, preserving order, on to the Soundex
value.
A, E, I, O, U, Y = 0
B, F, P, V = 1
C, G, J, K, Q, S, X, Z = 2
D, T = 3
L = 4
M, N = 5
R = 6
Now, combine any double numbers into a single instance of that number.
Further, if the first number in the Soundex value is the same as the
code number for the initial letter, delete the first number. Now,
remove all zeros from the Soundex string. Finally, return the first
four characters of the end product as the Soundex encoding. If there
are less than four characters to be returned, concatenate enough zeros
to make the length four.
|