Traditional Culture Encyclopedia - Traditional festivals - Understand four solutions to hash conflict

Understand four solutions to hash conflict

hashing is a solution to improve efficiency by recompressing data. However, because the hash value generated by hash function is limited, and there may be more data, there are still different data corresponding to the same index value after hash function processing. At this time, there is a hash conflict (both values need the same address index position).

filling factor (filling factor = total data/hash table length), hash function and conflict handling method

are actually the realization of hash table.

1. Open address method (rehash method)

It can be generally understood that all addresses are open to all values, rather than the closed way of chain address method, and a value is fixed at an index address position.

p1=hash(key) If there is a conflict, it is +1 or hashed on the basis of p1 address, and P2 = hash (P1) ...

(1) Linear detection

If the value of a certain data already exists, one unit will be added later on the basis of the original value until there is no hash conflict.

(2) Quadratic detection

When determining values in sequence, if the value of a certain data already exists, first add the square unit of 1 to the original value, and if it still exists, subtract the square unit of 1. Followed by 2 squared, 3 squared and so on. Until no hash conflict occurs.

compared with linear detection, the detection step is changed. Because if they are all +1 to detect, the efficiency will be very poor in the case of large data.

(3) Pseudo-random detection

When determining values in sequence, if some data already exists, a number is randomly generated by a random function, and a random number is added to the original value until a hash conflict does not occur.

2. Chain address method (HashMap's hash conflict resolution method)

For the same value, use linked list to connect. Use arrays to store each linked list.

Advantages:

(1) The zipper method is simple to deal with conflicts, and there is no accumulation phenomenon, that is, non-synonyms will never conflict, so the average search length is short;

(2) Because the node space on each linked list in the zipper method is dynamically applied, it is more suitable for the situation that the table length cannot be determined before making the table;

(3) In order to reduce conflicts, the open addressing method requires a small filling factor α, so it will waste a lot of space when the node scale is large. However, α≥1 can be selected in the zipper method, and when the node is large, the pointer field added in the zipper method can be ignored, thus saving space;

(4) In the hash table constructed by zipper method, the operation of deleting nodes is easy to realize. Simply delete the corresponding nodes on the linked list.

Disadvantages:

When the pointer occupies a large space, it will cause space waste. If the space is used to increase the size of the hash table, the efficiency of the open address method will be improved.

3. establish a public overflow area

establish a public overflow area to store all data with hash conflicts.

4. rehash method

hash the conflicting hash values again until there is no hash conflict.

it can be understood as p=hash(key), and if there is a conflict, p = hash 2 (key) ...

References:

Article 1

Video (very understandable).