Class 7: Systems of linear equations and SAGEmath

Overview: This part introduces you to an essential skill in any mathematician's or cryptographer's tool chest. Using linear algebra (matrices) to find answers to linear systems. This talent will unlock the ability to break the LCGs and LFSRs from the last lesson.

What is a linear system?

You're probably comfortable with old-school algebra problems like: on the line $y = x + 8$, what is the value of $y$ when $x$ is $12$?

No big deal right? Now if we have two lines $y = x + 8$ and $y = 2x - 4$ we might ask, where do they intersect?

$\begin{align} y&= 2x - 4 \\ y&= x + 8 \end{align}$

You probably have several ways to solve it in mind already. Substitute: $ x+8 = 2x - 4 $ leads to $x = 12$. Or subtract one equation from the other:

$$ (y-y) = (2x - x) + (-4 -8) $$

The best solution for this course is to look at the system of linear equations as a matrix problem ( $A \vec{x} = \vec{b} $ ). It's much more awkward for the small cases but the only realistic way to code it when we're dealing with tens or hundreds of equations. Here is the basic idea, make the equations match $ A \cdot \begin{bmatrix} y \\ x \end{bmatrix} = \vec{b} $ then find $A^{-1}$ and we'll know that $ \begin{bmatrix} y \\ x \end{bmatrix} = A^{-1} \cdot \vec{b} $. Here are my steps:

$ \begin{align} y& -2x& = -4 \\ y& -x& = 8 \end{align} $

$ \begin{bmatrix} 1 & -2 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} y \\ x \end{bmatrix} = \begin{bmatrix} -4 \\ 8 \end{bmatrix} $

$$ A^{-1} = \begin{bmatrix} -1 & 2 \\ -1 & 1 \end{bmatrix} $$

$ \begin{bmatrix} y \\ x \end{bmatrix} = \begin{bmatrix} -1 & 2 \\ -1 & 1 \end{bmatrix} \cdot \begin{bmatrix} -4 \\ 8 \end{bmatrix} = \begin{bmatrix} 20 \\ 12 \end{bmatrix}$

Try it out: Solve the system $ 3y = 2x + 4 $ and $2y = 2x - 4$. Now write it down as a matrix problem.

Generalize: Now try to solve the system $ ay + bx = e$ and $cy + dx = f$ in full generality. When is there no solution?

Increase the size: Now take a stab at solving a three equation system in $x,y,z$. You'll have to research inverting a $3 \times 3$ matrix.

Overview: In this section I introduce you to a freely available and powerful tool called SAGE. It is basically Python mixed with high-powered math. I'll show you the basics of getting setup and how to use it to solve some of the number theory problems we face. After you're in I'll give you some problems to work through.

Solving Number Theory Problems Online

Here is a screen-shot by screen-shot intro to signing up for SAGE math using your UDel email. This is a fast browser-based way to get high-powered number theory working for you and it will help when you need to solve tricky problems using exact math algorithms.

Here is a video demonstrating how to solve a problem in SAGE (matrix inversion).

Now try these algebraic problems:

Now here are a set of number theory exercises that should push you a little bit, but get your mind right for using this new tool in your arsenal. These refer to the end of class 5 which has a bullet-point jump through some algebra and they help us set up an attack we will pull off later in the course.

Euler's Phi: Calculate $\Phi(10)$ using SAGE. What are the elements in $\mathbb{Z}_{11}^{*}$? How many generators of $\mathbb{Z}_{11}^{*}$ should there be? Find those elements that generate $\mathbb{Z}_{11}^{*}$ .4, the generators are 2, 6, 7, and 8

Cyclic Subgroups: Pick one of those generators which generates all ten elements and find the unique subgroup of size 5. Now find the unique subgroup of size 2.

Cosets by hand: Find all cosets of the subgroup of size 2 (multiply the elements of that subgroup by something not in that group).

Safe Primes: Generate a "safe prime" which is a prime number that happens to be 1 more than double another prime number, that is, $p = 2\cdot q + 1$. Find it by trial and error and using x.is_prime().

Generalize: Modulo your safe prime what are the orders of the elements in the multiplicative subgroup?

Generalize: How many generators are there? Find one.

Overview: In this part we use the skills from the last two parts to break the Linear Congruential Generator (a harder way) by figuring out the secret parameters and the current "state". Then we can predict correctly what the rest of the sequence will be (far from 50/50). This will be the simplest break we do but the others build on this example.

Continuing with Systems of Linear Equations

Now one of the things about math (and math education) is that solving a problem that has been setup for you is much easier than finding the setup in the wild. In order for the skills of this lesson to be of any real value you'll want to develop some instincts for finding systems of linear equations in problems you face.

In this case I'm outright telling you that we can find linear algebra inside of the PRNGs we've seen so far. But let's make that more clear now. (If you like a challenge then pause and think about where the linear systems are so far.)

Using Linear Algebra to tackle LCGs

Let's start with the Linear Congruential Generator. You know that each term is a function of the term before, $t_i = a \cdot t_{i-1} + c \pmod{m} $. If you see several terms in a row then to predict the next term you need to know $a, c$ and $m$. You often know these values and might just want to know the seed, or you know the modulus and only care to predict all future values. OK. So let's make it simple for now. Suppose $m = 101$ and you see the sequence $ \{40, 42, 49 \} $. Can you figure out $a$ and $c$?

$$ \begin{align} 42 &= a \cdot 40 &+ &c \pmod{101} \\ 49 &= a \cdot 42 &+ &c \pmod{101} \end{align} $$

Solve it: solve this simple LCG.

Target: the following four terms were generated by an LCG (modulo 2**32). Find the next term:

3904654068
897137925
4281244274
3573336907

Fast and Dirty Analysis of Systems

When you're out in the wild thinking about solving a linear system you have to know if it will work. Here is the rule:

Count the number of equations and count the number of unknowns. If there are more equations than unknowns then there is either 1 exact answer or no answer. If you've got more unknowns than equations then there are an infinite number of possible answers.

In this previous example we could generate as many equations as we want. If there is an answer then any two will do.

LCG with less knowledge:

If you're a CTF trainer you might need to crack an LCG without knowing the modulus, the multiplier, or the increment. You still can you just need more data points (because if the number of equations is more than the number of unknowns you'll be alright).

Here is a script I use all of the time to do this:

Overview: In this part we extend the ideas of the last PRNG exploit to the larger system for Linear Feedback Shift Registers. In breaking these systems we'll see that designers of LFSRs could add just a bit of extra noise to thwart us and many stream ciphers do just that.

World 1: We know the coefficients

So if someone is generating a stream of random bits with a Linear Feedback Shift Register and you know which register-bits are XORed to generate the next input bit then it's really easy to correctly predict all future bits. Let's do that now.

Recall that a general LFSR looks like:

General Linear Feedback Shift Register

Let's say that the length of the register, $L$, is $5$ and that every coefficient is $1$. That means that the next input bit is a "parity-check" of the current 5 (add 'em all up mod 2).

Anticipate: How many bits do you need to see before you can correctly predict all future bits? 5

What's the process? Wait until you know enough bits to reverse engineer the state of the register once, then you'll know all of the future states of the register.

Actually do it: Suppose that you got the opening bits 1, 0, 0, 1, 1. Can you tell me the next 10 bits that would follow? 1, 1, 0, 0, 1, 1, 1, 1, 0, 0

How good is that? When you see that output do you see an issue with the setup of that LFSR? What is the length of it's cycle? What is the maximum length of an LFSR cycle with $L=5$?

World 2: We don't know the feedback coefficients

Now what do we do? Before, the moment we knew the entire register once we could correctly know it in all future cases. But what if you don't know how the next bit is made?

I claim that it takes $2L$ bits before you can break it, but that you need linear algebra.

First an observation about XOR. It is the same as addition modulo 2. $ 1 \oplus 1 = 1 + 1 \pmod{2}$.

Confirm: Confirm that XOR on the 0 and 1 has the exact same output as addition modulo 2.

So what does that get us? Well we can now look at register bits as linear combinations of products.

If $c_1, \ldots, c_L$ are unknown zeroes or ones then $s_{t+L} = s_{t+L-1} \cdot c_1 + \cdots + s_{t} \cdot c_L \pmod{2} $.

We can also reverse engineer as many of the $s_i$ as we need by just waiting and watching, just like in World 1.

So we have $L$ unknowns and we need $L$ equations. After waiting for $L+1$ bits we have 1 equation, but after $L+2$ bits we have 2 equations. So after $L+L$ bits we should have $L$ equations with $L$ unknowns and that's enough to solve the linear algebra.

Let's just do it. I've secretly selected 5 coefficients for an LFSR. The first 10 bits of output are: 0, 1, 1, 0, 0, 1, 1, 0, 0, 1

The first 6 give us one equation:

$1=0\cdot c_5+ 0 \cdot c_4+ 1 \cdot c_3+ 1 \cdot c_2+ 0 \cdot c_1 $

Continue that one more time and you get:

$ 1= 1\cdot c_5+ 0 \cdot c_4+ 0 \cdot c_3+ 1 \cdot c_2+ 1 \cdot c_1 $

Write down the equations: Go ahead and generate 3 more equations from the sequence of bits.

Now find the coeffs Use SAGE to invert a $5 \times 5$ matrix over GF(2) to solve for the coefficients.

Predict without linear algebra: Of course a pattern emerged and you can probably solve this one without using SAGE. What's the pattern?

Prove it: At what point can you guarantee that the pattern you found repeats forever?

Do it at full size: The first 38 bits of a length 19 LFSR's output are: 00100001000001000111010001001110010100. Find the coefficients.