长期待咕。

散列函数应该满足:

哈希算法应该满足 3 个基本属性:

MD5

MD5 创建一个 128-bit 消息摘要。

Description

referred to Cryptographic Hashing Functions - MD5, Farhad Ahmed Sagar、https://github.com/pod32g/MD5/blob/master/md5.c、jhsn 三年前的 implemention 等。

据资料所言,一共 5 步(前 2 步用以追加与填充消息,第 3, 4 步用了一些辅助函数)

English Ver.

  1. Padding Bits: $len(padded_message) + 64 \equiv 0\ (mod\ 512)$:

    • append bits: '1' + '0...0' until len() % 512 == 448 (512 - 64)
  2. Append Length:

    1. add a 64-bit representation of the original message to make len() % 512 == 0
    2. divide it into 512-bit blocks, then divide each of them into 16 words (means 32-bit each), we denote them as $M[0,…,N-1]$, where N % 16 == 0.
  3. Initialize MD Buffer:

    • four word buffer (32-bit each)
      word A 01 23 45 67
      word B 89 ab cd ef
      word C fe dc ba 98
      word D 76 54 32 10
  4. Process Message in 16-word Blocks:

    1. use four Auxiliary Functions:

      • inputs: three 32-bit word

      • use each of these 4 functions in 4 rounds (each round has 16 operations)

      • output: a 32-bit word

        md5_func

    2. use a table consisting 64-elements:

      • $K[0,…,63]$: hex(floor(abs(sin(i + 1)) * 2^32))

        md5_table

    3. ALGORITHM: See my another intro below (Chinese Ver.).

  5. buffer AA, BB, CC, DD contains MD5 digest of our input message.

Chinese Ver.

其实就两步:填充、计算。

  1. 填充

    initial_message 用 bits100..00 追加填充,直到 $len(message)+64\equiv 0\ (mod\ 512)$;然后追加填充 64-bit len(initial_message) % 2**64,以使 $len(message)\equiv 0\ (mod\ 512)$;最后按 512-bit 将其划分成块,每一块又分为 16 个 32-bit 小块。

  2. 计算(过程中一定保证 Zmod(2^32)

    首先介绍几个需要用到的辅助函数:

    MD_buffer

    md5_func

    md5_param1

    md5_param3

    md5_param2

    1. 对于每一个 512-bit block,进行 64 轮循环,每轮循环的结构一致,可大致理解为 a0 = am + left_rotate(an + F(a[]) + Message[g] + K[i], s[j]);而选取的参数、函数需要根据轮数的不同去选择。
    2. 在每一个 512-bit 运算后,将对应参数 A, B, C, D 累加。最后将其依次拼接起来,即为 MD5 digest
    • 以上 round 3 g = (3 * i + 5) % 16 更正。

Implementation

import struct
import hashlib
from math import sin, floor
from Crypto.Util.number import bytes_to_long
# params
a0, b0, c0, d0 = 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
K = [int(floor(2**32 * abs(sin(i + 1)))) for i in range(64)]
s = [
    7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
    5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20,
    4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
    6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21
]

def rev_hex(hex_str, length):
    hex_str = hex_str[2:].rjust(length, '0')
    output = '0x'
    for i in range(len(hex_str), -1, -2):
        output += hex_str[i:i + 2]
    return output

def bytes_to_int(a):
    b = 0
    for i in a:
        b = 256 * b + i
    return b

def left_rotate(x, s):
    return ((x << s) | (x >> 32 - s)) % 2**32

def md5(m):
    temp = rev_hex(hex(len(m) * 8 % 2**64), 16)[2:]
    length = b''.join([chr(int(temp[i:i + 2], 16)).encode('latin1') for i in range(0, 16, 2)])
    m = m + b'\x80'
    if len(m) % 64 != 56:
        m = m + b'\x00' * ((56 - len(m)) % 64)
    m = m + length
    
    m = [m[i:i + 64] for i in range(0, len(m), 64)]
    AA, BB, CC, DD = a0, b0, c0, d0
    for block in m:
        A, B, C, D = AA, BB, CC, DD
        M = [int(rev_hex(hex(bytes_to_int(block[i:i + 4])), 8)[2:], 16) for i in range(0, 64, 4)]

        for i in range(64):
            if i < 16:
                F = (B & C) | ((~B) & D)
                g = i
            elif i < 32:
                F = (D & B) | ((~D) & C)
                g = (5 * i + 1) % 16
            elif i < 48:
                F = B ^ C ^ D
                g = (3 * i + 5) % 16
            else:
                F = C ^ (B | (~D))
                g = (7 * i) % 16
            D, C, B, A = C, B, (B + left_rotate((A + F + K[i] + M[g]) % 2**32, s[i])) % 2**32, D
        AA = (AA + A) % 2**32
        BB = (BB + B) % 2**32
        CC = (CC + C) % 2**32
        DD = (DD + D) % 2**32
        
    digest = rev_hex(hex(AA), 8)[2:] + rev_hex(hex(BB), 8)[2:] + rev_hex(hex(CC), 8)[2:] + rev_hex(hex(DD), 8)[2:]
    return digest
def md5_test():
    message = input().encode()
    digest = md5(message)
    print(digest)
    assert hashlib.md5(message).hexdigest() == digest

md5_test()

几个问题:

Collision