0.2.7.2 Dealing with Collisions

Search
Next: 0.2.7.3 Choosing a Table Up: 0.2.7 Hashing Previous: 0.2.7.1 Selecting a Hash

0.2.7.2 Dealing with Collisions

No matter how good your hash function is, or how carefully you choose the hash table size, sometimes data collisions are bound to occur. Recall that a collision is when two distinct data items produce the same hash value and, thus, want to be stored in the same table location. For instance, suppose you have an item, A, already in your hash table at location 144. A's hash function value, therefore, is 144. Another item, B, with a totally different key than A, is to be added to the table. Suppose, however, that B's hash value is 144also. B cannot be stored at table location 144 because A is already there... or can it?

One way of dealing with this situation is to make each location in the hash table the head of a linked-list data structure. If you find a collision has occurred, traverse down the linked list at the hash value and add the new data item to the list's tail (or head). Of course, when you search for an item in a table using this insertion scheme you must be mindful of the fact that such an item may not be at the head of the linked list residing in the hash table at the hash location. This method is sometimes known as open hashing because multiple data items sharing the same hash value are stored ``outside'' the hash table.

The following methods all store collided data inside the hash table at different locations than their computed hash value. This practice is known as closed hashing.

The classic way to deal with collisions is to simply increment an item's hash value by one until a finding an unoccupied hash table location then store the item a location or two away from it's computed hash value. This method is known as linear probing. In querying a table employing such an insertion scheme you have to not only check at the hash location of the item for which you are looking, but, if it contains some item other than the one you seek, continue to search in adjacent hash table locations until you either find your goal item or encounter an empty table location. Linear probing, while simple to understand and implement, leads to data clumping and is not the ideal way of handling collisions.

Different spins on the concept of probing are known as quadratic probing and random probing. In quadratic probing instead of simply moving one address down in the hash table the number of spaces moved is somehow dependent on the number of times that we have moved so far. For example, consider the following hash value offset function:

o(i) = 2ⁱ - 1

The first collision adds one address to the base hash address, the second three, the third seven, and so on. When quadratic probing is employed, collisions tend not to produce the clusters of full areas in the hash table which are linear probing's drawback.

Random probing uses the address of the collision as the seed for a pseudo-random number generator and computes the next address to try with a function which takes a random element into account. Both quadratic and random probing are usually slower than linear probing but produce more uniform hash data distributions.

One final way of dealing with collisions is called rehashing or dualhashing. If the hash address produced is a collision, the address is reused as input to the hash function in order to compute a new address. This process, as you might expect, complicates lookups and deletes.

Scott Gasch
1999-07-09