Every positive integer is a sum of four integer squares

From(http://www.alpertron.com.ar/4SQUARES.HTM)

Every positive integer is a sum of four integer squares

by Michael Barr

Introduction

The theorem of the title has been known for centuries, perhaps longer, but I believe that Lagrange gave the first proof. When I was a student, I saw a very different (and, in my opinion, harder) proof from the one given here. I don't know who discovered this proof, but I saw it in Alan Baker's little book on number theory.

There is yet another proof that proves and uses unique factorization in the non-commutative ring of quaternionic integers. Warning: the quaternionic integers include (1+i+j+k)/2 and in fact is the ring of all polynomials in that number.

All proofs that I have seen use the identity

(a²+b²+c²+d²) (A²+B²+C²+D²) = (aA+bB+cC+dD)² + (aB-bA+cD-dC)² + (aC-bD-cA+dB)² + (aD-dA+bC-cB)²

which implies that the product of two numbers, each of which is a sum of four squares, is also a sum of four squares. Although this identity can be verified directly it is worth pointing out that by taking u=a-bi-cj-dk and U=A+Bi+Cj+Dk, this is simply the square of the identity that says that |uU|=|u||U|. That is why I often refer to it as the quaternionic product identity. Anyway the conclusion is that it suffices to prove for primes. Since every power of 2 is obviously either a square or a sum of two squares, it suffices to prove that every odd prime is a sum of four squares.

For later use, we also mention what might be called the complex product identity:

(a²+b²) (A²+B²) = (aA+bB)² + (aB-bA)²

We give here a proof that has the added advantage not only of being constructive, but actually quite easily implemented as an algorithm. It is not deterministic in the sense that there are trials, but it is arranged so that each trial has at least a 50% chance of success. I assume that the reader is familiar with modular arithmetic, although I may add an introduction to that in the future. If a and n are numbers, we say that the remainder r when a is divided by n is the residue of a mod n (residue was the old word for what we now call remainder, but the old usage has hung on in number theory). Clearly, a = r (mod p) and r is the smallest non-negative number with that property. Closely related is the absolutely least residue defined as whichever of r and r-n is smaller in absolute value. If n happens to be even and r=n/2, then |r|=|r-n| and you can take either one, but in order for the phrase to have an unambiguous meaning, we will take r.

So we wish to show that every odd prime is a sum of four squares. Actually, we give two separate arguments, the first that every prime congruent to 1 mod 4 is a sum of two squares and then that every prime congruent to 3 mod 4 is a sum of four squares.

Primes congruent to 1 mod 4

We begin by showing that if p is a prime and p = 1 (mod 4), then there is a number a such that a² = -1 (mod p). We say that -1 is a quadratic residue mod p. One way to see this is the following. If a, b, and c are three integers and p does not divide a, then ab = ac (mod p) implies that b = c (mod p). The reason is that the conditions imply that p divides ab-ac and p does not divide a so that p divides b-c, which is the conclusion. Now for any a such that 0 < a < p the numbers a, 2a, 3a, ..., (p-1)a are all non-congruent mod p and therefore must be congruent to 1, 2, 3, ..., p-1 in some order. By taking the product of all the integers in the two sets, we see that a^p-1(p-1)! = (p-1)! (mod p). Since p divides no factor of (p-1)!, it can be canceled to conclude that a^p-1 = 1 (mod p) (Fermat's little theorem).

Next we note that for any a not divisible by p, if we let b=a^(p-1)/2, then b² = 1 (mod p). This means that p divides (b²-1) = (b-1)(b+1) which means that a^(p-1)/2 = b = ±1 (mod p).

Can a^(p-1)/2=1 for all a? No, it cannot. I will not give all the details here, but the underlying reason is that arithmetic mod a prime is really like ordinary arithmetic in nearly all respects. In particular, the usual argument that a polynomial of degree d has at most d distinct roots is valid. If f(x) is a polynomial, then f(a) = 0 (mod p) if and only if x-a divides f(x), with division taking place mod p. This is not true for non-primes, by the way. For example, x²-1 = (x-1)(x+1) has four distinct roots mod 8 because 2 can divide one of the factors and 4 can divide the other. Getting back to the main argument, the polynomial x^(p-1)/2-1 can have at most (p-1)/2 roots as can the other factor of x^p-1-1 = (x^(p-1)/2-1) (x^(p-1)/2+1). Since the polynomial does have p-1 roots (namely, 1, 2, ..., p-1), it follows that exactly (p-1)/2 of the factors must satisfy the equation x^(p-1)/2 = -1 (mod p). Up to now, this argument is valid for all odd primes. The next step is limited to primes p, for which 4 divides p-1.

If a^(p-1)/2 = -1 (mod p) and we let b=a^(p-1)/4, then b² = -1 (mod p), as desired.

So the first step of the process is to find a square root of -1 mod p. We have shown, in fact, that exactly half of the numbers 1, ..., p-1 have the desired property. I do not know a deterministic process for finding such a number, but they appear to be distributed randomly so you can just try numbers at random until you find one and each trial has a 50% chance of success. It is not hard to show that the smallest solution must be a prime so that you can just try primes 2, 3, 5, 7, 11,... until you succeed. Even if you are trying 1000 digit numbers it is unlikely that you will not find a solution before the 2000th prime, so for computational purposes, it is probably worth keeping a list of the first 2000 or so primes.

Actually, you can do better than that if a higher power of 2 divides p-1. If 2^k divides p-1 and we let m=(p-1)/2^k, then as soon as a^m 1 (mod p), it follows that one among the numbers a, a², a⁴ will be a square root of -1 mod p. The probability of a number chosen at random satisfying a^m = 1 (mod p) is only 1 in 2^k-1.

Now a²+1² is a multiple of p, say a²+1²=kp. Since a <= p-1, we see that a²+1 < p² so that k < p. What we will now describe is a procedure that continues to produce solutions of a²+b²=kp, but with smaller values of k until k=1 and we are through. Assuming we have such a pair a and b, let a' and b' be the absolutely least residues a mod k and b mod k, respectively. This means that a' and b' are integers in the range (-k/2,k/2]. It follows that a'²+b'² <= (k/2)²+(k/2)² = k²/2. Also, a'²+b'² = a²+b² = 0 (mod k) since a = a' (mod k) and b = b' (mod k).

Thus a'²+b'²=mk with m < k. Multiplying, we find that (a²+b²)(a'²+b'²) = mk²p. Now we have the identity,

(a²+b²)(a'²+b'²) = (aa'+bb')²+(ab'-ba')² (*)

Since aa'+bb' = a²+b² = 0 (mod k) and ab'-ba' = ab-ba = 0 (mod k) it follows that both terms on the right hand side of (*) are divisible by k and so

((aa'+bb')/k)² + ((ab'-ba')/k)² = mp

and m < k. It can be shown that the number of reduction steps is bounded above by log₂ p.

Primes congruent to 3 mod 4

In this case, we begin by finding numbers a and b such that a²+b²+1 = 0 (mod p). This is the same as b²=-1-a² (mod p). So we can look at -1-4, -1-9, ..., until one of them has a square root mod p. I don't know if there is a theorem on the subject, but empirically it seems that for each value of a there is about a 50% chance that -1-a² is a quadratic residue. It is not hard to prove that there is at least one solution to y²=-1-x² (mod p). To see this, first observe that the squares 0², 1², ..., ((p-1)/2)² have all distinct residues mod p since a² = b² (mod p) implies that p divides (a²-b²)=(a+b)(a-b) and hence that a = b (mod p) or a = -b (mod p), which is impossible for two distinct numbers in the range [0,(p-1)/2]. Thus those squares have (p+1)/2 distinct residues. For similar reasons, the numbers -1-0², -1-1², ..., -1-((p-1)/2)² have (p+1)/2 distinct residues. Since there are at most p, the two sets of residues overlap and there is at least one solution to x²+y²+1 = 0 (mod p). This proves that there is at least one solution, but if there were only one, finding it would be like looking for a very small needle in a very large haystack (assuming p is very large). In fact, trials convince me that -1-x² is congruent to a square about half the time and there are very many solutions. In fact, it can be shown that -2=-1-1² is a quadratic residue whenever p = 3 (mod 8), which already covers half the cases of primes that are congruent to 3 mod 4. Similar criteria exist for -5=-1-2², -10=-1-3², etc. So it looks as if you can simply try successively -2, -5, -10, ... until you find one that works. (Incidentally, -1 can never be a quadratic residue for these primes, a very easy exercise.)

For our purposes, however, it is not enough to find x, you also need y. Fortunately, there is for primes that are congruent to 3 mod 4, there is a very simple way of constructing a square root of any number that is a quadratic residue. There is, to my knowledge, no correspondingly easy way of doing this for primes congruent to 1 mod 4. Consider a number a, not divisible by p. Suppose that a = b² (mod p). Then

a^(p+1)/2 = b^p+1 = b²b^p-1 = b² = a (mod p)

from which we see that (a^(p+1)/4)² = a (mod p) and hence a^(p+1)/4 is a square root of a mod p if it has one. (Otherwise it is a square root of -a, as a matter of interest.)

One we have found x and y so that x²+y²+1 = 0 (mod p), we can replace them by absolutely least residues and suppose that |x| < (p-1)/2 and |y| < (p-1)/2 from which one sees that x²+y²+1 < p²/2. Thus we get initial values of a,b,c,d for which a²+b²+c²+d² = kp with k < p.

We will show how, if k > 1, to find a new solution with a smaller value of k. The argument is similar to, but harder than, the previous case of two squares. We consider the case of k odd and even separately, for reasons we make clear. Suppose k is odd. In that case, let a', b',c', and d' be the absolutely least residues of a, b, c, and d mod k. Then each of |a'|, |b'|, |c'| and |d'| is <= (k-1)/2 so that

a'²+b'²+c'²+d'² <= (k-1)²

while

a'²+b'²+c'²+d'² = a²+b²+c²+d² = 0 (mod k)

Hence a'²+b'²+c'²+d'² = mk with m < k. Now we have that

mk²p = (a²+b²+c²+d²) (a'²+b'²+c'²+d'²) = (aa'+bb'+cc'+dd')² + (ab'-ba'+dc'-cd')² + (ac'-ca'+bd'-db')² + (ad'-a'd+cb'-bc')² and, in a way similar to the first case, each term is divisible by k, from which we conclude that

mp = ((aa'+bb'+cc'+dd')/k)² + ((ab'-ba'+dc'-cd')/k))² + ((ac'-ca'+bd'-db')/k)² + ((ad'-a'd+cb'-bc')/k)²

which completes the construction.

When k is even, this argument fails specifically in the case that a'=b'=c'=d'=k/2 since then a'²+b'²+c'²+d'² = k². Although we could give a proof that works just in that case, it is just as easy and computationally better to give an argument that halves k even. We observe that a²+b²+c²+d² = kp is even so that none or two or all four of a, b, c and d are even. In any case we can arrange them so that a±b and c±d are even and then

((a+b)/2)² + ((a-b)/2)² + ((c+d)/2)² + ((c-d)/2)² = (k/2)p

which completes the proof.