web3.path

PHASE 02 Crypto · ~3 hours

Advanced Cryptography

Now the specific primitives blockchain depends on: Keccak-256, ECDSA over secp256k1, and the Merkle tree that makes "give me a proof that this tx is in the block" a 32-byte problem instead of a 1MB problem.

Goal — implement a Merkle tree from scratch, generate + verify inclusion proofs, and know why Ethereum chose Keccak over SHA-3-final.

1. SHA-256 vs Keccak-256

Bitcoin uses SHA-256. Ethereum uses Keccak-256 — the original NIST SHA-3 submission, before NIST tweaked the padding to produce SHA-3. Ethereum forked before the tweak; now everyone in EVM-land is stuck with Keccak.

Gotchahashlib.sha3_256 in Python is not Keccak-256. Use pysha3, eth_hash, or from Crypto.Hash import keccak.
from Crypto.Hash import keccak
k = keccak.new(digest_bits=256); k.update(b"hello")
print(k.hexdigest())
# 1c8aff950685c2ed4bc3174f3472287b56d9517b9c948127319a09a7a36deac8

2. RSA vs ECC (and why Ethereum picked ECC)

RSA-2048ECDSA secp256k1
Public key256 bytes33 bytes (compressed)
Signature256 bytes65 bytes
SpeedSlow keygenFast everything
Security (~bits)112128
Analogy — RSA is like parallel parking a Winnebago; ECC is parking a Vespa. Same security, 8× less space. When every byte costs gas, Ethereum picked the Vespa.

3. Ethereum addresses — it's just hashing

private_key (32B random) │ ecdsa point multiply ▼ public_key (64B: x || y) │ keccak256 ▼ hash (32B) → take last 20 bytes → 0x....... (Ethereum address)
from eth_keys import keys
pk = keys.PrivateKey(b"\x01"*32)
print(pk.public_key.to_checksum_address())
# 0x1a642f0E3c3aF545E7AcBD38b07251B3990914F1

This means an address is not random — it's a deterministic function of a private key. Lose the key → lose the account. There is no password reset; you are the math.

4. Merkle trees

A Merkle tree is a binary tree of hashes. Leaves are H(data_i); internal nodes are H(left || right); the root is a single 32-byte commitment to all the data.

Analogy — think of a Merkle root as a sha256sum of your whole folder, but with a superpower: given the root, you can prove one file is inside by showing only log₂(N) sibling hashes, not every file.
root = H(H12 || H34) / \ H12 = H(H1||H2) H34 = H(H3||H4) / \ / \ H1=H(A) H2=H(B) H3=H(C) H4=H(D) Proof that C is in the tree: [H4, H12] Verifier recomputes: H3 = H(C); H34 = H(H3||H4); root' = H(H12||H34) Accept iff root' == known root.
Show: 30-line Merkle tree + proof in Python
import hashlib
H = lambda b: hashlib.sha256(b).digest()

def build(leaves):
    layer = [H(x) for x in leaves]
    tree = [layer]
    while len(layer) > 1:
        if len(layer) % 2: layer.append(layer[-1])  # duplicate last
        layer = [H(layer[i] + layer[i+1]) for i in range(0, len(layer), 2)]
        tree.append(layer)
    return tree  # tree[0]=leaves, tree[-1]=[root]

def proof(tree, idx):
    p = []
    for layer in tree[:-1]:
        sib = idx ^ 1
        if sib >= len(layer): sib = idx  # duplicated
        p.append((layer[sib], idx & 1))  # (hash, is_right_child_of_parent)
        idx //= 2
    return p

def verify(leaf, proof_, root):
    h = H(leaf)
    for sib, side in proof_:
        h = H(sib + h) if side else H(h + sib)
    return h == root

leaves = [b"A", b"B", b"C", b"D"]
t = build(leaves)
root = t[-1][0]
assert verify(b"C", proof(t, 2), root)

5. Where Merkle trees actually live in Ethereum

SWE takeaway — a block header (~500 bytes) cryptographically commits to gigabytes of state. That's the entire basis of light clients, rollups, and bridges.

6. Commit–reveal & HMAC (sidebar)

Useful patterns you'll reach for later:

Project

Deliverable — extend your Phase 1 CLI with a merkle build <dir> and merkle prove <file> command. You should be able to prove any file's inclusion in ~32 × log₂(N) bytes.

Quiz

Q. You hash two different transactions with Keccak-256 and somehow get the same output. What just broke?
If a hash function is broken, attackers can swap tx payloads without changing the root. Every chain using that hash is done.
← Phase 1Phase 3: Distributed Systems →