Overview: Let's setup the concept of a Message Authentication Code (MAC) and what it is used for. We'll provide a common example, the checksum, and show why it doesn't do the job.

Preventing tampering

The goal of a Message Authentication Code (MAC from now on) is to prevent message tampering. At this stage we won't mix MACs and Encryption (soon).

We will generate a fixed length "tag" (a "signature" really) which our partner can validate. If they get the message and can't reproduce the "tag" then they presume the message is corrupted. Here is a diagram showing the general setup (they specify HMAC which is an algorithm we will explore soon, but you can see it as just MAC in this diagram):

Setup using HMAC as the tag

Why not just hash the message?

You've probably seen MD5 checksums which are just the MD5 hash of a file you're downloading. This serves a certain role in a trusted world, but we're dealing with adversaries. Imagine that some download site wanted to give you malware disguised as a an install package for an Adobe product. If they gave you a checksum it would seem more trustworthy. But really they could generate a checksum of whatever they want you to install and you would be able to verify it.

This is a weak version of a "man-in-the-middle" attack. You might trust Adobe not to infect you but the third party site generated a "signature" which doesn't actually transfer that trust. So we use a private shared key.

MACs are also called "cryptographic checksums", think of them as hashes with keys.

The definitions

A MAC leans on a secret key that both the sender and receiver know, it is symmetric key.

Formally, send $m, T$ the message and a tag calculated using some MAC and the key, $K$. The receiver gets $m^{*}, T^{*}$ (possibly altered by a nefarious party). If $MAC_K(m^{*}) = T^{*}$ then the receiver has verified $m^{*}$, if not you reject the message, $m^{*}$.

The goal of your adversary is to generate a different message and tag that the receiver would verify as correct.

The attributes of a MAC

A MAC produces a cryptographically secure authentication tag for a given message.
The signer and verifier use the same key (symmetric key).
MACs accept messages of arbitrary length.
Any manipulation of the message made in transit should be detected by the receiver.
The verifier is assured of the source of the message.
A tag can be repudiated in court since the key is shared. (He said, she said.)
The ideal MAC acts like a random function from all messages and keys to all possible n-bit binary strings.

(ASIDE): a secret attack

You might wonder why we don't do something simple like $H(K|m)$ for some secure Hash function $H$. We didn't explore the guts of how hash functions compress large amounts of data down to something fixed length, but they use something akin to a block cipher mode for compression, called the Merkle-Damgård construction. The heart of most hash functions is a "compressor" that takes in two fixed-length inputs and spits out one output of that length. Then it chains blocks of compression into the final hash. Knowing that we can actually forge signatures with an unknown secret key.

Hash Pump Install hashpump from https://github.com/bwall/HashPump

Test It: Use a "secret" of "andyrulz" then hash, with SHA256, the string "andyrulzbanana". Save the hexdigest.

Forge Something: Now use hashpump as follows: hashpump -s 'hexdigesthere' --data 'banana' -a 'somethinghere' -k 8 The output will be a new hexdigest and payload. Validate that sha256('andyrulz'+newpayload) is the predicted hexdigest.

Hash Extension CTF Problem:

Do it: You will need to "set a cookie". https://crypto.prof.ninja/class13/flag.php

Overview: We'll show you our first cryptographically secure MAC which is a clever application of hashing to generate a difficult to forge tag. The system is called the HMAC (Hash-based Message Authentication Code).

Hash plus key equals tricky

There are big problems with just hashing a concatenation of your key and your message to generate your tag (different problems as a prefix than as a suffix BTW). I invite you to research those issues on your own, but for now I just want to show you the answer, HMAC.

hmac diagram

If $H$ is the hash algorithm, $K$ is your secret key, and $M$ is your message then we create three more variables: $K^{+}$ your key shifted up until it has the hash's block-length (check: this chart for block-lengths), ipad is 0x36 repeated until you've got the same block-size and opad is 0x5c also repeated until you've got enough bytes.

Then the equation is: $HMAC_K(M) = H[ (K^{+} \oplus opad) || h[ (K^{+} \oplus ipad) || M ]]$.

Using a Library

Python has a baked-in hmac library. Here's how you use it.

Validate: The tag (in hex) which comes out of that snippet should be b815f8be415bc163b4e563c21a1ebd26d862f22a047c0aafd3dd0f919e653a0f

Doing it by hand

OK, now by hand we have to know that SHA256 has a 64-byte block length, so we need to build $K^{+}$, $ipad$, and $opad$.

Verify: Confirm that this by hand computation matches the library implementation.

Checking with NIST

Finally let's make sure we're up to snuff with National standards.

There is a zip of test-vectors for HMAC in this zipped file.

The SHA-256 test cases are under L=32.

According to the original standard it is appropriate to truncate your tag. In the NIST examples you'll see a TLen parameter which is the number of bytes they truncate to.

Validate by library: Take an example input key and message and generate the same tag (don't forget to truncate).

Validate by hand: Pick another example and confirm that you can make the same HMAC tag.

Overview: Another important MAC is CBC-MAC, but you have to be careful when you use it. The next two parts show you why.

The next MAC we explore is based on the CBC mode of a block cipher.

CBC-MAC diagram

Run a block cipher in CBC mode using the MAC key, $K$, on your message like normal. Use the last block of the ciphertext as the MAC tag. Often IV is set to 0.

$H_0 := IV$
$H_i := F_K(P_i \oplus H_{i-1})$
$MAC := H_k$

Confirm this message: using CBC-MAC, AES, IV=0 ( \x00*16 really), and sk = "andy love simone" I generated the pair: "andy love simoneandy love simone" and tag (hexdigest) d6fdc5d5596e6ff6c3039cfbb5d9216f Verify that I didn't make a typo. (You can do this by hand or hacking at MODE_CBC, might be educational to do both).

Overview: This is a fundamental attack which you have to think about. The basic idea is that finding two matching random values is significantly easier when you check all pairs and don't preselect one value. When it comes to hashes and MACs we end up with a new attack which beats brute-force and that we can't really avoid.

Birthday Attacks

This is a beautiful and generic attack. The context is when someone is randomly selecting values from a range (which is one way of interpreting encryption, hashing, and authenticating).

Share your birthday: put your birthday (or make up a date really) into the slack channel. (Our class is probably too small for this, 20 people is perfect.)

Did we have a collision?

If not, do you spot a loved one's birthday?

Probability: Given two people what are the odds that they have different birthdays?

Probability 2: Add a third person to your equation. Now what are the odds of no collision?

Script Probability: use Python to generalize your equation and tell me the number of people which give a probability of 50% for a collision. Now repeat for 80%.

This treats our birthdays as uniformly distributed values out of 365 days. The general intuition is that it takes $\sqrt{N}$ samples from a space of size $N$ to have 50% chance of a collision.

Here is some intuition for that. Imagine selecting $k$ values at random from $N$. Then out of the $k$ values you picked there are $k(k-1)/2$ pairs. For any given pair there is a $ 1/N$ chance of collision. So that gives $\frac{k(k-1)}{2N}$ chance of collision. So $k \approx \sqrt{N}$ will lead to around 50% chance of collision.

Playground: Write a function which collects random integers from 1 to $N$ until a collision is found. At that point it should display the number of selected numbers. Run it with $N = 10000$ several times and report the average value in the chat.

ASIDE: CBC mode and the birthday paradox

Putting this to use in CBC mode (let $c_k$ be the kth ciphertext block and $p_k$ be the kth plaintext block) we can derive that if $c_i = c_j$ then $F_k(c_{i-1} \oplus p_i) = F_k(c_{j-1} \oplus p_j)$. So $c_{i-1} \oplus p_i = c_{j-1} \oplus p_j $ by decrypting and finally

$$ p_i \oplus p_j = c_{i-1} \oplus c_{j-1} $$

That means that when you see enough encrypted blocks of length 128 (roughly $2^{64}$) then you are likely to find a collision which you can then use to find the XOR of two plaintext blocks by looking one spot earlier of your collided blocks. The XOR of plaintext leaks plenty of information.

Playing with hash collisions and the paradox

The birthday attack relies on any match coming from within a set and not a specific match to a specific value. That intuition should guide us as we approach MACs.

So this birthday attack gives us a generic approach for finding two messages which hash to the same value in far less time than brute force. The size that matters is the output size of the hash function too.

So what do we do with this information?

When words collide: Grab the game dictionary ( wget http://crypto.prof.ninja/dictionary.txt ) and take the md5 hexdigest of everyword. Find me the words that collide on the last 4 bytes (8 hex characters). This is a space of $2^{32}$ so $2^{16}$ words should be enough to find a collision.

Single Match: imagine that I asked you to find a word which matches "banana" on the last bytes. If I asked for a 4 byte match we would fail. How many bytes do you think would give us a better than 20% chance of success?

Overview: The biggest failure of a MAC scheme is an attacker that can generate a false message that you actually accept as real. (Transfer more cash into my account please.) CBC-MAC and the birthday paradox opens a door for this.

Weaknesses

An attacker often has the ability to get arbitrary messages tagged.

Collision vs controlled collision: given a particular message (and AES-backed CBC-MAC), how many other messages would you expect to try tagging before you find a collision? Using a birthday attack how many messages would you need to get tagged before you find two that match?

Imagine that you have found two messages $m_1, m_2$ such that $MAC_K(m_1) = MAC_K(m_2)$ then $MAC_K(m_1 + x) = MAC_K(m_2 + x)$.

I wish I had a handy collision to do this by hand, but I don't. As a basic user I might only attempt a problem on the scale of $\leq 2^{40}$.

So let's look at another one which we can implement.

CTF FORGERY FLAG

Challenge

DO THIS AT flag2.php

You don't know my secret key, but the following three message/tag pairs have been made with AES and CBC-MAC (0 = IV).

'the 1st message.', 9ff9bd8e3b1d0657cad7fe89b31130a8
'the 2nd message.', 8aa501578b405e709c2a46cb808f70b9
'the 1st message.the 2nd message.', c9eb264b3e5fd526efaf0530f10e9119

Forge a new message of the form 'the 2nd message.'+x which would be verified by me.