Traditional Culture Encyclopedia - Traditional customs - What does MD5 collision mean and how to deal with hash collision?

What does MD5 collision mean and how to deal with hash collision?

MD5 is a widely used cryptographic hash function, which has been widely used in the field of computer security. In 2004, MD5 proved that it could not prevent collision, so it was not suitable for safety certification. So what does MD5 collision mean, and how do we deal with hash collision?

Simply put, get the MD5 value of a string first, and then calculate a different string according to this value, but their MD5 values are the same. This is the MD5 collision, and the probability is very small.

Our common collision methods: violent collision (exhaustive method, dictionary method), is to use computer resources to try to collide with known MD5 codes.

1, an exhaustive method

The exhaustive method is to constantly try the permutation and combination of various characters to see which combination of MD5 codes can match. The disadvantage is that it takes too much time. For example, suppose we want to crack a 6-digit password with mixed uppercase and lowercase letters and numbers, then a * * * has six combinations of (26+26+10). The scale of this figure exceeds 50 billion.

2. Dictionary method

The dictionary method stores the calculation results in the form of a mapping table, with one original text corresponding to one MD5 value. By looking up the known MD5 code, we can directly find out the original text. The dictionary method embodies the idea of "exchanging space for time" in algorithm design. The disadvantage is that it takes up a lot of space and actually needs to exhaust all the inputs, but the exhausted results are saved.

Decrypt the website of md5 by dictionary method:/

There are usually two ways to deal with conflicts: the open addressing method and the linking method. The former stores all nodes in the hash table T [0 ... m-1]; The latter usually puts all the hashed elements in the same slot of a linked list, and puts the head pointer of this linked list in the hash table T[0..m- 1].

1, open addressing method

All the elements are in the hash table, and each entry either contains an element of the dynamic set or is zero. In this method, the hash table may be too full to insert new elements. In the open addressing method, when an element is to be inserted, the items in the hash table can be continuously checked or detected until there is an empty slot to place the keyword to be inserted. There are three technologies for open addressing: linear detection, secondary detection and double detection.

The linear detection method is suitable for the scene.

ThreadLocalMap in Java uses open addressing to solve hash conflicts, because the time complexity of open addressing will degenerate into O(n) in extreme environment, so it is suitable for scenes with less data.

2. Chain address method

Chain address method is also called linked list method. This method is relatively common and simple. When inserting an element, if an element is found in the current position, a linked list is constructed with the current node as the head node (tail insertion method) or the tail node (head insertion method). If further optimized, the linked list can be modified into an array structure such as a red-black tree. For example, HashMap after jdk 1.8 is optimized in this way.

The chain address method is suitable for the scene.

The hash conflict handling method based on linked list is more suitable for hash tables with large storage objects and large data, because the linked list itself needs extra space to store pointer addresses, so if the data stored by a node is not as large as the pointer, it will cause great space waste; On the contrary, if the pointer size can be ignored relative to the data size, then the chain address method will be a better choice to solve the conflict, which will be more flexible and support more optimization strategies. For example, when the number of linked list nodes in HashMap is greater than 8, it will be transformed into a red-black tree for optimization.

This paper introduces the meaning of MD5 collision and the methods to deal with hash collision: open addressing method and chain address method. In addition, the application scenarios of linear detection method and chain address method are also introduced. Through the above introduction, you should have a certain understanding of these issues, and more knowledge about MD5 can be found in previous articles.