# Four Squares Together

## Mathematica ~~ 61 66~~ 51

Three methods are shown. Only the first approach meets the time requirement.

### 1-FindInstance (51 char)

This returns a single solution the equation.

```
FindInstance[a^2 + b^2 + c^2 + d^2 == #, {a, b, c, d}, Integers] &
```

**Examples and timings**

```
FindInstance[a^2 + b^2 + c^2 + d^2 == 123456789, {a, b, c, d}, Integers] // AbsoluteTiming
```

{0.003584, {{a -> 2600, b -> 378, c -> 10468, d -> 2641}}}

```
FindInstance[a^2 + b^2 + c^2 + d^2 == #, {a, b, c, d}, Integers] &[805306368]
```

{0.004437, {{a -> 16384, b -> 16384, c -> 16384, d -> 0}}}

### 2-IntegerPartitions

This works also, but is too slow to meet the speed requirement.

```
f@n_ := Sqrt@IntegerPartitions[n, {4}, Range[0, Floor@Sqrt@n]^2, 1][[1]]
```

`Range[0, Floor@Sqrt@n]^2`

is the set of all squares less than the square root of `n`

(the largest possible square in the partition).

`{4}`

requires the integer partitions of `n`

consist of 4 elements from the above-mentioned set of squares.

`1`

, within the function `IntegerPartitions`

returns the first solution.

`[[1]]`

removes the outer braces; the solution was returned as a set of one element.

```
f[123456]
```

{348, 44, 20, 4}

### 3-PowerRepresentations

PowerRepresentations returns **all** of the solutions to the 4 squares problem. It can also solve for sums of other powers.

PowersRepresentations returns, in under 5 seconds, the 181 ways to express 123456789 as the sum of 4 squares:

```
n= 123456;
PowersRepresentations[n, 4, 2] //AbsoluteTiming
```

However, it is far too slow for other sums.

## FRACTRAN: ~~156~~ 98 fractions

Since this is a classic number theory problem, what better way to solve this than to use numbers!

```
37789/221 905293/11063 1961/533 2279/481 57293/16211 2279/611 53/559 1961/403 53/299 13/53 1/13 6557/262727 6059/284321 67/4307 67/4661 6059/3599 59/83 1/59 14279/871933 131/9701 102037079/8633 14017/673819 7729/10057 128886839/8989 13493/757301 7729/11303 89/131 1/89 31133/2603 542249/19043 2483/22879 561731/20413 2483/23701 581213/20687 2483/24523 587707/21509 2483/24797 137/191 1/137 6215941/579 6730777/965 7232447/1351 7947497/2123 193/227 31373/193 23533/37327 5401639/458 229/233 21449/229 55973/24823 55973/25787 6705901/52961 7145447/55973 251/269 24119/251 72217/27913 283/73903 281/283 293/281 293/28997 293/271 9320827/58307 9831643/75301 293/313 28213/293 103459/32651 347/104807 347/88631 337/347 349/337 349/33919 349/317 12566447/68753 13307053/107143 349/367 33197/349 135199/38419 389/137497 389/119113 389/100729 383/389 397/383 397/39911 397/373 1203/140141 2005/142523 2807/123467 4411/104411 802/94883 397/401 193/397 1227/47477 2045/47959 2863/50851 4499/53743 241/409 1/241 1/239
```

Takes in input of the form 2^{n} × 193 and outputs 3^{a} × 5^{b} × 7^{c} × 11^{d}. Might run in 3 minutes if you have a *really* good interpreter. Maybe.

...okay, not really. This seemed to be such a fun problem to do in FRACTRAN that I

hadto try it. Obviously, this isn't a proper solution to the question as it doesn't make the time requirements (it brute forces) and it's barely even golfed, but I thought I'd post this here because it's not every day that a Codegolf question can be done in FRACTRAN ;)

### Hint

The code is equivalent to the following pseudo-Python:

```
a, b, c, d = 0, 0, 0, 0
def square(n):
# Returns n**2
def compare(a, b):
# Returns (0, 0) if a==b, (1, 0) if a<b, (0, 1) if a>b
def foursquare(a, b, c, d):
# Returns square(a) + square(b) + square(c) + square(d)
while compare(foursquare(a, b, c, d), n) != (0, 0):
d += 1
if compare(c, d) == (1, 0):
c += 1
d = 0
if compare(b, c) == (1, 0):
b += 1
c = 0
d = 0
if compare(a, b) == (1, 0):
a += 1
b = 0
c = 0
d = 0
```

### Perl - ~~116 bytes~~ 87 bytes (see update below)

```
#!perl -p
$.<<=1,$_>>=2until$_&3;
{$n=$_;@a=map{$n-=$a*($a-=$_%($b=1|($a=0|sqrt$n)>>1));$_/=$b;$a*$.}($j++)x4;$n&&redo}
$_="@a"
```

*Counting the shebang as one byte, newlines added for horizontal sanity.*

Something of a combination code-golf fastest-code submission.

The average (worst?) case complexity seems to be *O(log n)**O(n ^{0.07})*. Nothing I've found runs slower than 0.001s, and I've checked the entire range from

*900000000 - 999999999*. If you find anything that takes significantly longer than that, ~0.1s or more, please let me know.

**Sample Usage**

```
$ echo 123456789 | timeit perl four-squares.pl
11110 157 6 2
Elapsed Time: 0:00:00.000
$ echo 1879048192 | timeit perl four-squares.pl
32768 16384 16384 16384
Elapsed Time: 0:00:00.000
$ echo 999950883 | timeit perl four-squares.pl
31621 251 15 4
Elapsed Time: 0:00:00.000
```

The final two of these seem to be worst case scenerios for other submissions. In both instances, the solution shown is quite literally the very first thing checked. For `123456789`

, it's the second.

If you want to test a range of values, you can use the following script:

```
use Time::HiRes qw(time);
$t0 = time();
# enter a range, or comma separated list here
for (1..1000000) {
$t1 = time();
$initial = $_;
$j = 0; $i = 1;
$i<<=1,$_>>=2until$_&3;
{$n=$_;@a=map{$n-=$a*($a-=$_%($b=1|($a=0|sqrt$n)>>1));$_/=$b;$a*$i}($j++)x4;$n&&redo}
printf("%d: @a, %f\n", $initial, time()-$t1)
}
printf('total time: %f', time()-$t0);
```

Best when piped to a file. The range `1..1000000`

takes about 14s on my computer (71000 values per second), and the range `999000000..1000000000`

takes about 20s (50000 values per second), consistent with *O(log n)* average complexity.

### Update

**Edit**: It turns out that this algorithm is very similar to one that has been used by mental calculators for at least a century.

Since originally posting, I have checked every value on the range from *1..1000000000*. The 'worst case' behavior was exhibited by the value *699731569*, which tested a grand total of *190* combinations before arriving at a solution. If you consider *190* to be a small constant - and I certainly do - the worst case behavior on the required range can be considered *O(1)*. That is, as fast as looking up the solution from a giant table, and on average, quite possibly faster.

Another thing though. After *190* iterations, anything larger than *144400* hasn't even made it beyond the first pass. The logic for the breadth-first traversal is worthless - it's not even used. The above code can be shortened quite a bit:

```
#!perl -p
$.*=2,$_/=4until$_&3;
@a=map{$=-=$%*($%=$=**.5-$_);$%*$.}$j++,(0)x3while$=&&=$_;
$_="@a"
```

Which only performs the first pass of the search. We do need to confirm that there aren't any values below *144400* that needed the second pass, though:

```
for (1..144400) {
$initial = $_;
# reset defaults
$.=1;$j=undef;$==60;
$.*=2,$_/=4until$_&3;
@a=map{$=-=$%*($%=$=**.5-$_);$%*$.}$j++,(0)x3while$=&&=$_;
# make sure the answer is correct
$t=0; $t+=$_*$_ for @a;
$t == $initial or die("answer for $initial invalid: @a");
}
```

In short, for the range *1..1000000000*, a near-constant time solution exists, and you're looking at it.

### Updated Update

@Dennis and I have made several improvements this algorithm. You can follow the progress in the comments below, and subsequent discussion, if that interests you. The average number of iterations for the required range has dropped from just over *4* down to *1.229*, and the time needed to test all values for *1..1000000000* has been improved from 18m 54s, down to 2m 41s. The worst case previously required *190* iterations; the worst case now, *854382778*, needs only *21*.

The final Python code is the following:

```
from math import sqrt
# the following two tables can, and should be pre-computed
qqr_144 = set([ 0, 1, 2, 4, 5, 8, 9, 10, 13,
16, 17, 18, 20, 25, 26, 29, 32, 34,
36, 37, 40, 41, 45, 49, 50, 52, 53,
56, 58, 61, 64, 65, 68, 72, 73, 74,
77, 80, 81, 82, 85, 88, 89, 90, 97,
98, 100, 101, 104, 106, 109, 112, 113, 116,
117, 121, 122, 125, 128, 130, 133, 136, 137])
# 10kb, should fit entirely in L1 cache
Db = []
for r in range(72):
S = bytearray(144)
for n in range(144):
c = r
while True:
v = n - c * c
if v%144 in qqr_144: break
if r - c >= 12: c = r; break
c -= 1
S[n] = r - c
Db.append(S)
qr_720 = set([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121,
144, 145, 160, 169, 180, 196, 225, 241, 244, 256, 265, 289,
304, 324, 340, 361, 369, 385, 400, 409, 436, 441, 481, 484,
496, 505, 529, 544, 576, 580, 585, 601, 625, 640, 649, 676])
# 253kb, just barely fits in L2 of most modern processors
Dc = []
for r in range(360):
S = bytearray(720)
for n in range(720):
c = r
while True:
v = n - c * c
if v%720 in qr_720: break
if r - c >= 48: c = r; break
c -= 1
S[n] = r - c
Dc.append(S)
def four_squares(n):
k = 1
while not n&3:
n >>= 2; k <<= 1
odd = n&1
n <<= odd
a = int(sqrt(n))
n -= a * a
while True:
b = int(sqrt(n))
b -= Db[b%72][n%144]
v = n - b * b
c = int(sqrt(v))
c -= Dc[c%360][v%720]
if c >= 0:
v -= c * c
d = int(sqrt(v))
if v == d * d: break
n += (a<<1) - 1
a -= 1
if odd:
if (a^b)&1:
if (a^c)&1:
b, c, d = d, b, c
else:
b, c = c, b
a, b, c, d = (a+b)>>1, (a-b)>>1, (c+d)>>1, (c-d)>>1
a *= k; b *= k; c *= k; d *= k
return a, b, c, d
```

This uses two pre-computed correction tables, one 10kb in size, the other 253kb. The code above includes the generator functions for these tables, although these should probably be computed at compile time.

A version with more modestly sized correction tables can be found here: http://codepad.org/1ebJC2OV This version requires an average of *1.620* iterations per term, with a worst case of *38*, and the entire range runs in about 3m 21s. A little bit of time is made up for, by using bitwise `and`

for *b* correction, rather than modulo.

### Improvements

**Even values are more likely to produce a solution than odd values.**

The mental calculation article linked to previously notes that if, after removing all factors of four, the value to be decomposed is even, this value can be divided by two, and the solution reconstructed:

Although this might make sense for mental calculation (smaller values tend to be easier to compute), it doesn't make much sense algorithmically. If you take *256* random *4*-tuples, and examine the sum of the squares modulo *8*, you will find that the values *1*, *3*, *5*, and *7* are each reached on average *32* times. However, the values *2* and *6* are each reached *48* times. Multiplying odd values by *2* will find a solution, on average, in *33%* less iterations. The reconstruction is the following:

Care needs to be taken that *a* and *b* have the same parity, as well as *c* and *d*, but if a solution was found at all, a proper ordering is guaranteed to exist.

**Impossible paths don't need to be checked.**

After selecting the second value, *b*, it may already be impossible for a solution to exist, given the possible quadratic residues for any given modulo. Instead of checking anyway, or moving onto the next iteration, the value of *b* can be 'corrected' by decrementing it by the smallest amount that could possibly lead to a solution. The two correction tables store these values, one for *b*, and the other for *c*. Using a higher modulo (more accurately, using a modulo with relatively fewer quadratic residues) will result in a better improvement. The value *a* doesn't need any correction; by modifying *n* to be even, all values of *a* are valid.