In 1996 it was discovered that the MD5 hash-algorithm had problems. Furthermore, an algorithm was devised to generate these collisions in 2005.
A collision is described as the case where two different inputs yield the same hash value. Or in other words, if x != y and md5(x) = md5(y) then we have a collision. Even more interesting is the fact that if you have a collision then you can easily produce a new one. So let h1 = md5(x), h2 = md5(y), h1' = md5(x+z), h2' = md5(y+z), where x != y and '+' means concatenation, then if h1 = h2 then h1' = h2'. Note that z has to be the same when used in x+z and y+z.
The above fact can be exploited in several ways. What I chose to do was to create files which have the same MD5-hash and size. By doing so the files are indistinguishable if MD5 is employed as the checking mechanism. My md5_evilize.py illustrates one such scenario:
netrom@sleipnir %> ./md5_evilize.py pack good.sh evil.sh good.pack evil.pack netrom@sleipnir %> md5 *.pack MD5 (evil.pack) = 6bfb1f7a7be478a900e3b0a20cb5790a MD5 (good.pack) = 6bfb1f7a7be478a900e3b0a20cb5790a netrom@sleipnir %> ls -l *.pack -rw-r--r-- 1 netrom staff 238 Dec 30 23:28 evil.pack -rw-r--r-- 1 netrom staff 238 Dec 30 23:28 good.pack netrom@sleipnir %> ./md5_evilize.py extract good.pack good2.sh netrom@sleipnir %> chmod +x good2.sh netrom@sleipnir %> ./good2.sh Hello, World! netrom@sleipnir %> ./md5_evilize.py extract evil.pack evil2.sh netrom@sleipnir %> chmod +x evil2.sh netrom@sleipnir %> ./evil2.sh All your base are belong to us!
In the above two files are created in the first line; good.pack and evil.pack. As expected they have the same MD5-hash and size. First, the good part is extracted and run. Then, secondly, the evil part is extracted and run. These are just simple shell-scripts but it could just as well be something more malicious (whatever you could imagine).
This whole process works by exploiting the above-stated fact about producing collisions from known ones. The program has two vectors of 128 bytes each. They are not the same and yield a MD5 collision. The good.pack is created by first writing (everything is done in binary mode) the first vector, preceded by 16 bytes for the size (in hex) of good.sh and evil.sh, and then the actual data of good.sh following that of evil.sh. evil.pack is created in a similar manner but using the second vector instead of the first one. This is done so that they can be distinguished when extracting.
I know that this only works when using this exact program to extract the files, but remember that the purpose was to create indistinguishable files not anything usable in particular.
EDIT 30/12/09: I made some smaller changes to the code so the example was updated.




![[FSF Associate Member]](/gfx/emblems/fsfmem.png)