Longest Prime Sums

Score 100 8605

I used an algorithm that starts with one solution and repeatedly tries to split a prime \$p\$ in the solution into two other primes \$q_ 1\$ and \$q_ 2\$ that satisfy \$\frac1{p-1} = \frac1{q_1-1}+\frac1{q_2-1}\$.

It is known (and can be quickly checked) that the positive integer solutions to \$\frac1n = \frac1x + \frac1y\$ are in one-to-one correspondence with factorizations \$n^2 = f_ 1 f_ 2\$, the correspondence being given by \$x = n + f_ 1\$, \$y = n + f_ 2\$. We can search through the factorizations of \$(p-1)^2\$ to see if any of them yielded a solution where both new denominators \$x,y\$ were one less then primes; if so, then \$p\$ can be replaced by \$q_1=x+1\$, \$q_2=y+1\$. (If there were multiple such factorizations, I used the one where the minimum of \$q_1\$ and \$q_2\$ was smallest.)

I started with this seed solution of length 44:

seed = {3, 7, 11, 23, 31, 43, 47, 67, 71, 79, 103, 131, 139, 191, 211, 239, 331, 419, 443, 463, 547, 571, 599, 647, 691, 859, 911, 967, 1103, 1327, 1483, 1871, 2003, 2311, 2347, 2731, 3191, 3307, 3911, 4003, 4931, 6007, 6091, 8779}

This seed was found using an Egyptian fraction solver that a former research student of mine, Yue Shi, coded in ChezScheme. (The primes \$p\$ involved all have the property that \$p-1\$ is the product of distinct primes less than 30, which increased the likelihood of Shi's program finding a solution.)

The following Mathematica code continually updates a current solution by looking at its primes one by one, trying to split them in the manner described above. (The number 1000 in the third line is an arbitrary stopping point; in principle one could let the algorithm run forever.)

solution = seed;
j = 1; (* j is the index of the element of the solution that we'll try to split *)

While[j <= 1000 && j <= Length[solution],
  currentP = solution[[j]];
  allDivisors = Divisors[(currentP - 1)^2];
  allFactorizations = {#, (currentP - 1)^2/#} & /@
    Take[allDivisors, Floor[Length[allDivisors]/2]];
  allSplits = currentP + allFactorizations;
  goodFactorizations = Select[allSplits, 
    And @@ PrimeQ[#] && Intersection[#, solution] == {} &];
  If[goodFactorizations == {},
    j++,
    solution = Union[Complement[solution, {currentP}], First@goodFactorizations]
  ]
]

The code above yields a solution of length 4126, whose largest element is about \$8.7\times10^{20}\$; by the end, it was factoring integers \$(p-1)^2\$ of size about \$8.8\times10^{21}\$.

In practice, I ran the code several times, using the previous output as the next seed in each case and increasing the cutoff for j each time; this allowed for the recovery of some small prime splits that had become non-redundant thanks to previous splitting, which somewhat mitigated the size of the integers the algorithm factored.

The final solution, which took about an hour to obtain, is too long to fit in this answer but has been posted online. It has length 8605 and largest element about \$4.62\times10^{19}\$.

Various runs of this code consistently found that the length of the solution was about 3–4 times as long as the set of primes that had been examined for splitting. In other words, the solution was growing much faster than the code scanned through the initial elements. It seems likely that this behavior would continue for a long time, yielding some gargantuan solutions.


Score 263 385 425 426 with only primes < 1.000.000 (was: non-competitive, now it is; score can be increased by running the program longer)

I followed the same path as Wheat Wizard: iteratively search for primes in the solution that can be replaced with a longer list of primes with the same result. I wrote that Python program that does exactly this. It starts with solution S = {2} and than iterates of all elements of that solution and tries to find a decomposition of that prime for which 1/(p-1) = sum(1/(q-1) for all q in the decomposition.

After I realized that S should be a set (and not a list), I altered the program to take this into account. I also added a ton of performance optimizations. The solution of 263 came up within 200 seconds or so (running under pypy3), but if you let it running, it steadily keeps coming with additional (longer) solutions.

Current best solution (425 elements, with all primes < 1M, calculated in ~ 15 min.):

S = [3, 5, 13, 19, 29, 37, 103, 151, 241, 281, 409, 541, 577, 593, 661, 701, 751, 1297, 1327, 2017, 2161, 2251, 2293, 2341, 2393, 2521, 2593, 2689, 2731, 3061, 3079, 3329, 3361, 3457, 6301, 6553, 7057, 7177, 7481, 7561, 8737, 9001, 9241, 9341, 10501, 11617, 12097, 12547, 14281, 14449, 14561, 15121, 17761, 17851, 18217, 18481, 20593, 21313, 22441, 23189, 23761, 24571, 26041, 26881, 28351, 28513, 29641, 30241, 36529, 37441, 46993, 49921, 51169, 57331, 58109, 58313, 58369, 58831, 59659, 60737, 60757, 61001, 61381, 61441, 61561, 61609, 63067, 63601, 64513, 64901, 65053, 65089, 65701, 65881, 66301, 66931, 67049, 69389, 69941, 70181, 72161, 72481, 72577, 72661, 73061, 73699, 74521, 77521, 78241, 79693, 81181, 86951, 88741, 90631, 98011, 100297, 102181, 107641, 108991, 109201, 109537, 114913, 117841, 118429, 121993, 122761, 123001, 124561, 127601, 128629, 130073, 130969, 131561, 133387, 133813, 138181, 138403, 139501, 146077, 149521, 159457, 160081, 162289, 163543, 166601, 174241, 175891, 176401, 177913, 180181, 182711, 189421, 199921, 201781, 206641, 218527, 223441, 227089, 229739, 234961, 238081, 238141, 238897, 239851, 246241, 250057, 261577, 266401, 267961, 280321, 280837, 280897, 281233, 283501, 283861, 284161, 287233, 288049, 291721, 297601, 299053, 302221, 306853, 309629, 313153, 316681, 322057, 325921, 332489, 342211, 342241, 349981, 352273, 354961, 355321, 360977, 365473, 379177, 390097, 390961, 394717, 395627, 401057, 404251, 404489, 412127, 412651, 416881, 417649, 418027, 424117, 427681, 428221, 428401, 429409, 430921, 434521, 435481, 441937, 443873, 444641, 451441, 453601, 454609, 455149, 459649, 466201, 468001, 473617, 474241, 480737, 481693, 483883, 496471, 498301, 498961, 499141, 499591, 499969, 501601, 501841, 502633, 513067, 514513, 517609, 523261, 524521, 525313, 529381, 538721, 540541, 545161, 550117, 552553, 560561, 562633, 563501, 563851, 568177, 570781, 575723, 587497, 590669, 591193, 599281, 601801, 601903, 604001, 605551, 607993, 609589, 611389, 617401, 621007, 627301, 628561, 628993, 629281, 635449, 637201, 639211, 642529, 645751, 651361, 651857, 653761, 654853, 655453, 657091, 662941, 664633, 667801, 669121, 669901, 670177, 673201, 673921, 675109, 688561, 689921, 691363, 692641, 694033, 695641, 697681, 698293, 700591, 703081, 703561, 705169, 705181, 707071, 709921, 713627, 732829, 735373, 737413, 739861, 742369, 745543, 750121, 750721, 754771, 756961, 757063, 758753, 759001, 760321, 761671, 762721, 766361, 773501, 774181, 776557, 779101, 782461, 784081, 784981, 786241, 788317, 794641, 795601, 797273, 800089, 801469, 808081, 808177, 810151, 813121, 815671, 819017, 823481, 823621, 825553, 831811, 833281, 833449, 836161, 839161, 840911, 846217, 859657, 859861, 860609, 863017, 865801, 869251, 870241, 875521, 876929, 878011, 880993, 884269, 891893, 895681, 898921, 899263, 902401, 904861, 905761, 907369, 908129, 914861, 917281, 917317, 921601, 922321, 923833, 926377, 939061, 941641, 942401, 943009, 943273, 944161, 944821, 944833, 949621, 949961, 950041, 950401, 953437, 953443, 954001, 957349, 957529, 960121, 960961, 963901, 964783, 967261, 967627, 967751, 968137, 971281, 973561, 973591, 984127, 984341, 984913, 986437, 991381, 992941, 994561, 995347, 996001]

Proof that is satisfies the challenge:

Some of the decompositions used:

1/(  2-1) = sum(1/(p-1)) for p in: {(5, 7, 13, 3)}
1/(  3-1) = sum(1/(p-1)) for p in: {(7, 13, 5)}
1/(  5-1) = sum(1/(p-1)) for p in: {(13, 7)}
1/(  7-1) = sum(1/(p-1)) for p in: {(13, 19, 37)}
1/( 13-1) = sum(1/(p-1)) for p in: {(19, 37)}
1/( 19-1) = sum(1/(p-1)) for p in: {(29, 71, 181)}
1/( 29-1) = sum(1/(p-1)) for p in: {(37, 127)}
1/( 37-1) = sum(1/(p-1)) for p in: {(43, 281, 2521)}
1/( 43-1) = sum(1/(p-1)) for p in: {(53, 223, 13469)}
...
1/(8779-1) = sum(1/(p-1)) for p in: {(8969, 739861, 941641)}
1/(8807-1) = sum(1/(p-1)) for p in: {(9001, 773501, 865801)}
1/(8821-1) = sum(1/(p-1)) for p in: {(8941, 657091)}
1/(8941-1) = sum(1/(p-1)) for p in: {(9041, 808177)}
1/(8969-1) = sum(1/(p-1)) for p in: {(9463, 227089, 705169)}
1/(9001-1) = sum(1/(p-1)) for p in: {(9109, 759001)}
1/(9041-1) = sum(1/(p-1)) for p in: {(9241, 417649)}
1/(9109-1) = sum(1/(p-1)) for p in: {(10891, 55661)}
1/(9181-1) = sum(1/(p-1)) for p in: {(9343, 529381)}
1/(9241-1) = sum(1/(p-1)) for p in: {(9341, 863017)}
1/(9341-1) = sum(1/(p-1)) for p in: {(15121, 25219, 784561)}
1/(9343-1) = sum(1/(p-1)) for p in: {(9689, 261577)}
1/(9463-1) = sum(1/(p-1)) for p in: {(10957, 69389)}
1/(9689-1) = sum(1/(p-1)) for p in: {(12457, 43597)}
1/(10333-1) = sum(1/(p-1)) for p in: {(10501, 645751)}
...
1/(131561-1) = sum(1/(p-1)) for p in: {(237361, 295153)}
1/(166601-1) = sum(1/(p-1)) for p in: {(243433, 527851)}
1/(266401-1) = sum(1/(p-1)) for p in: {(386401, 857809)}
1/(355321-1) = sum(1/(p-1)) for p in: {(639577, 799471)}

Python3 code:

import itertools
import sympy
import cProfile
import functools
import bisect
import operator


def sundaram3(max_n):
    # Returns a list of all primes under max_n
    numbers = list(range(3, max_n + 1, 2))
    half = (max_n) // 2
    initial = 4
    for step in range(3, max_n + 1, 2):
        for i in range(initial, half, step):
            numbers[i - 1] = 0
        initial += 2 * (step + 1)
        if initial > half:
            return [2] + list([_f for _f in numbers if _f])


# Precalculate all primes up to a million to speed things up
PRIMES_TO_1M = list(sundaram3(1000000))


def nextprime(number):
    # partly precalculated fast version for calculating the
    # first (e.g. smallest) prime that is largest than numer
    global PRIMES_TO_1M

    if number <= PRIMES_TO_1M[-2]:
        return PRIMES_TO_1M[bisect.bisect(PRIMES_TO_1M, number)]
    return sympy.nextprime(number)


def isprime(number):
    # partly precalculated fast version to determine of number is prime
    global PRIMES_TO_1M

    if number < 1000000:
        return number in PRIMES_TO_1M
    return sympy.isprime(number)


def upper_limit(prime, length=2):
    # Returns the largest prime q in the decomposition of prime with the given
    # length such that 1 / (prime - 1) = sum( 1 / (q - 1)) for q in 
    # set V, with V has the given length.
    # ASSUMPTION: all q are unique; this assumption is not validated,
    # but for this codegolf, the solution must be a set, so this is safe.
    if length == 1:
        return prime
    nextp = nextprime(prime)
    largestprime = (prime * nextp - 2 * prime + 1 ) // (nextp - prime)
    if not isprime(largestprime):
        largestprime = nextprime(largestprime)
    return upper_limit(largestprime, length - 1)


def find_decomposition(prime, length=2):
    # Returns a list of primes V = {q1, q2, q3, ...} for which holds that:
    # 1 / (prime - 1) = sum(1 / (q - 1)) for q in V.
    # Returns None if this decomposition is not found.
    # Note that there may be more than one V of a given prime, but this
    # function returns the first found V (implementation note: the sortest one)
    print(f"Searching decomposition for prime {prime} with length {length} in range ({prime}, {upper_limit(prime, length)}]")
    prime_range = PRIMES_TO_1M[bisect.bisect(PRIMES_TO_1M, prime) + 1:
                               bisect.bisect(PRIMES_TO_1M, upper_limit(prime, length) + 1)]
    # we only search for combinations of length -1; the last factor is calculated
    for combi in itertools.combinations(prime_range, length - 1):
        # we find the common factor of prime and all primes in combi
        # and use that to calculate the remaining prime. This is faster
        # than trying all prime combinations.
        factoritems = [-prime + 1] + [c - 1 for c in combi]
        factor = -functools.reduce(operator.mul, factoritems)
        remainder = - factor / sum(factor // p for p in factoritems) + 1
        if remainder == int(remainder) and isprime(remainder) and remainder not in combi:
            combi = combi + (int(remainder),)
            print(f"Found decomposition: {combi}")
            return combi
    return None


def find_solutions():
    # Finds incrementally long solutions for the set of primes S, for which:
    # sum(1/(p-1)) == 1 for p in S.
    # We do this by incrementally searching for primes in S that can be
    # replaced by longer subsets that have the same value of sum(1/p-1).
    # These replacements are stored in a dictionary "decompositions".
    # Starting with the base solution S = [2] and all decompositions, 
    # you can construct S.
    decompositions = {}  # prime: ([decomposition], max tried length)
    S = [2]
    old_solution_len = 0

    # Keep looping until there are no decompositions that make S longer
    while len(S) > old_solution_len:
        # Loop over all primes in S to search for decompositions
        for p in sorted(set(S)):
            # If prime p is not in the decompositions dict, add it
            if p not in decompositions:
                decompositions[p] = (find_decomposition(p, 2), 2)
            # If prime p is in the decompositions dict, but without solution,
            # try to find a solution 1 number longer than the previous try
            elif not decompositions[p][0]:
                length = decompositions[p][1] + 1
                decompositions[p] = (find_decomposition(p, length), length)
            # If prime p is in decompositions and it has a combi
            # and the combi is not already in S, replace p with the combi                
            elif all(p not in S for p in decompositions[p][0]):
                old_solution_len = len(S)
                print(f"Removing occurence of {p} with {decompositions[p][0]}")
                S.remove(p)
                S.extend(decompositions[p][0])
                S = sorted(S)
                break  # break out of the for loop
        print(f"Found S with length {len(S)}: S = {S}")
        print(f"Decompositions: ")
        for prime in sorted(decompositions.keys()):
            print(f"    1/({prime:3}-1) = sum(1/(p-1)) for p in: \u007b{decompositions[prime][0]}\u007d")


def main():
    cProfile.run("find_solutions()")


if __name__ == "__main__":
    main()

Score 32 34 36

{5, 7, 11, 13, 17, 23, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 101, 113, 131, 137, 151, 211, 229, 241, 281, 313, 379, 401, 433, 457, 491, 521, 571, 601, 25117, 293362609}

This is an improvement of Arnauld's answer. I just noticed that

\$ \dfrac{1}{19-1}=\dfrac{1}{73-1}+\dfrac{1}{61-1}+\dfrac{1}{41-1} \$

But 41 and 61 were already used in Arnauld's answer so I had to then figure out that

\$ \dfrac{1}{61-1} = \dfrac{1}{151-1} + \dfrac{1}{101-1} \$ and \$ \dfrac{1}{41-1} = \dfrac{1}{281-1}+\dfrac{1}{211-1}+\dfrac{1}{151-1}+\dfrac{1}{101-1} \$

But now I am using 151 and 101 twice. So I spent some time and discovered that

\$ \dfrac{1}{151-1} = \dfrac{1}{401-1} + \dfrac{1}{241-1} \$ and \$ \dfrac{1}{101-1} = \dfrac{1}{601-1} + \dfrac{1}{571-1} + \dfrac{1}{457-1} + \dfrac{1}{229-1} \$

So now I can just replace the 19 with 71, 151, 101, 151, 211, 229, 241, 281, 401, 457, 471, 601 and the sequence will maintain it's properties.

I also discovered that I can replace 19 with the sequence 71, 151, 101, 151, 211, 241, 241, 281, 401, 433, 541, 601, but that has 241 twice.

After that improvement I also noticed that 79 could be replaced with 521, 313, 131, to increase the size by 2 more. And 73 can be replaced with 113, 379, 433 for another 2.