Forking Factorials

Mathematica

A parallel-capable function :

 f[n_, g_] := g[Product[N@i, {i, 1, n, 2}] Product[N@i, {i, 2, n, 2}]]

Where g is Identity or Parallelize depending on the type of process required

For the timing test, we'll slightly modify the function, so it returns the real clock time.

f[n_, g_] := First@AbsoluteTiming[g[Product[N@i,{i,1,n,2}] Product[N@i,{i,2,n,2}]]]

And we test both modes (from 10^5 up to 9*10^5): (only two kernels here)

ListLinePlot[{Table[f[i, Identity],    {i, 100000, 900000, 100000}], 
              Table[f[i, Parallelize], {i, 100000, 900000, 100000}]}]   

Result: enter image description here


Haskell: 209 200 198 177 chars

176 167 source + 33 10 compiler flag

This solution is pretty silly. It applies product in parallel to a value of type [[Integer]], where the inner lists are at most two items long. Once the outer list is down to at most 2 lists, we flatten it and take the product directly. And yes, the type checker needs something annotated with Integer, otherwise it won't compile.

import Control.Parallel.Strategies
s(x:y:z)=[[x,y::Integer]]++s z;s x=[x]
p=product
f n=p$concat$(until((<3).length)$s.parMap rseq p)$s[1..n]
main=interact$show.f.read

(Feel free to read the middle part of f between concat and s as "until I heart the length")

Things seemed like they were going to be pretty good since parMap from Control.Parallel.Strategies makes it pretty easy to farm this out to multiple threads. However, it looks like GHC 7 requires a whopping 33 chars in command line options and environment vars to actually enable the threaded runtime to use multiple cores (which I included in the total). Unless I am missing something, which is possible definitely the case. (Update: the threaded GHC runtime appears to use N-1 threads, where N is the number of cores, so no need to fiddle with run time options.)

To compile:

ghc -threaded prog.hs

Runtime, however, was pretty good considering the ridiculous number of parallel evaluations being sparked and that I didn't compile with -O2. For 50000! on a dual-core MacBook, I get:

SPARKS: 50020 (29020 converted, 1925 pruned)

INIT  time    0.00s  (  0.00s elapsed)
MUT   time    0.20s  (  0.19s elapsed)
GC    time    0.12s  (  0.07s elapsed)
EXIT  time    0.00s  (  0.00s elapsed)
Total time    0.31s  (  0.27s elapsed)

Total times for a few different values, the first column is the golfed parallel, the second is the naive sequential version:

          Parallel   Sequential
 10000!      0.03s        0.04s
 50000!      0.27s        0.78s
100000!      0.74s        3.08s
500000!      7.04s       86.51s

For reference, the naive sequential version is this (which was compiled with -O2):

factorial :: Integer -> Integer
factorial n = product [1..n]
main = interact $ show.factorial.read

Java, 523 519 434 430 429 chars

import java.math.*;public class G extends Thread{BigInteger o,i,r=BigInteger.ONE,h;G g;G(BigInteger O,int
I,int n){o=O;i=new BigInteger(""+I);if(n>1)g=new G(O.subtract(r),I,n-1);h=n==I?i:r;start();}public void
run(){while(o.signum()>0){r=r.multiply(o);o=o.subtract(i);}try{g.join();r=r.multiply(g.r);}catch(Exception
e){}if(h==i)System.out.println(r);}public static void main(String[] args){new G(new BigInteger(args[0]),4,4);}}

The two 4s in the final line are the number of threads to use.

50000! tested with the following framework (ungolfed version of the original version and with a handful fewer bad practices - although it's still got plenty) gives (on my 4-core Linux machine) times

7685ms
2338ms
1361ms
1093ms
7724ms

Note that I repeated the test with one thread for fairness because the jit might have warmed up.

import java.math.*;

public class ForkingFactorials extends Thread { // Bad practice!
    private BigInteger off, inc;
    private volatile BigInteger res;

    private ForkingFactorials(int off, int inc) {
        this.off = new BigInteger(Integer.toString(off));
        this.inc = new BigInteger(Integer.toString(inc));
    }

    public void run() {
        BigInteger p = new BigInteger("1");
        while (off.signum() > 0) {
            p = p.multiply(off);
            off = off.subtract(inc);
        }
        res = p;
    }

    public static void main(String[] args) throws Exception {
        int n = Integer.parseInt(args[0]);
        System.out.println(f(n, 1));
        System.out.println(f(n, 2));
        System.out.println(f(n, 3));
        System.out.println(f(n, 4));
        System.out.println(f(n, 1));
    }

    private static BigInteger f(int n, int numThreads) throws Exception {
        long now = System.currentTimeMillis();
        ForkingFactorials[] th = new ForkingFactorials[numThreads];
        for (int i = 0; i < n && i < numThreads; i++) {
            th[i] = new ForkingFactorials(n-i, numThreads);
            th[i].start();
        }
        BigInteger f = new BigInteger("1");
        for (int i = 0; i < n && i < numThreads; i++) {
            th[i].join();
            f = f.multiply(th[i].res);
        }
        long t = System.currentTimeMillis() - now;
        System.err.println("Took " + t + "ms");
        return f;
    }
}

Java with bigints is not the right language for golfing (look at what I have to do just to construct the wretched things, because the constructor which takes a long is private), but hey.

It should be completely obvious from the ungolfed code how it breaks up the work: each thread multiplies an equivalence class modulo the number of threads. The key point is that each thread does roughly the same amount of work.

Tags:

Code Golf