Why is there such a large performance difference between these two scrips that do the same thing?

The construct [...] is an array composer. It eagerly iterates the iterable found within it, and stores each value into the array. Only then do we proceed to do the iteration. That results in far more memory allocation and is less cache-friendly. By contrast, parentheses do nothing (aside from grouping, but they don't add any semantics beyond that). Thus:

[1..1_000_000]
    .grep( * !%% 2 )
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

Will allocate and set up a million element array and iterate it, while:

(1..1_000_000)
    .grep( * !%% 2 )
    .grep( -> $x { $x == $x.flip } )
    .grep( -> $y { $y.base(2) == $y.base(2).flip } )
    .sum.say

Runs rather faster, because it need not do that.

Further, the ... operator is currently far slower than the .. operator. It's not doomed to be that way forever, it's just received a lot less attention so far. Since .grep has also been decently well optimized, it turns out to be quicker to filter out the elements made by the range - for now, anyway.

Finally, using == to compare the (string) results of base and flip is not so efficient, since it parses them back into integers, when we could use eq and compare the strings:

(1 .. 1_000_000)
    .grep(* !%% 2)
    .grep( -> $x { $x eq $x.flip } )
    .grep( -> $y { $y.base(2) eq $y.base(2).flip } )
    .sum.say

If you want something that is faster, you can write your own sequence generator.

gather {
  loop (my int $i = 1; $i < 1_000_000; $i += 2) {
    take $i
  }
}
.grep( -> $x { $x eq $x.flip } )
.grep( -> $y { $y.base(2) eq $y.base(2).flip } )
.sum.say

Which takes about 4 seconds.

Or to go even faster, you can create the Iterator object yourself.

class Odd does Iterator {
    has uint $!count = 1;

    method pull-one () {
        if ($!count += 2) < 1_000_000 {
            $!count
        } else {
            IterationEnd
        }
    }
}

Seq.new(Odd.new)
.grep( -> $x { $x == $x.flip } )
.grep( -> $y { $y.base(2) == $y.base(2).flip } )
.sum.say

Which only takes about 2 seconds.

Of course if you want to go as fast as possible, get rid of the sequence iteration entirely.

Also use native ints.

Also cache the base 10 string. (my $s = ~$x)

my int $acc = 0;
loop ( my int $x = 1; $x < 1_000_000; $x += 2) {
  next unless (my $s = ~$x) eq $s.flip;
  next unless $x.base(2) eq $x.base(2).flip;
  $acc += $x
}
say $acc;

Which gets it down to about 0.45 seconds.

(Caching the .base(2) didn't seem to do anything.)

This is probably close to the minimum without resorting to using nqp ops directly.

I tried writing a native int bit flipper, but it made it slower. 0.5 seconds.
(I did not come up with this algorithm, I only adapted it to Raku. I also added the +> $in.msb to fit this problem.)

I would guess that spesh is leaving in operations that don't need to be there.
Or maybe it isn't JITting very well.

It might be more performant for values larger than 1_000_000.
(.base(2).flip is O(log n) whereas this is O(1).)

sub flip-bits ( int $in --> int ) {
  my int $n =
       ((($in +& (my int $ = 0xaaaaaaaa)) +> 1) +| (($in +& (my int $ = 0x55555555)) +< 1));
  $n = ((($n  +& (my int $ = 0xcccccccc)) +> 2) +| (($n  +& (my int $ = 0x33333333)) +< 2));
  $n = ((($n  +& (my int $ = 0xf0f0f0f0)) +> 4) +| (($n  +& (my int $ = 0x0f0f0f0f)) +< 4));
  $n = ((($n  +& (my int $ = 0xff00ff00)) +> 8) +| (($n  +& (my int $ = 0x00ff00ff)) +< 8));
  ((($n +> 16) +| ($n+< 16)) +> (32 - 1 - $in.msb)) +& (my int $ = 0xffffffff);
}

…

  # next unless (my $s = ~$x) eq $s.flip;
  next unless $x == flip-bits($x);

You can even try to use multiple threads.

Note that this workload is entirely too little for this to be effective.
The overhead of using threads swamps out any benefit.

my atomicint $total = 0;

sub process ( int $s, int $e ) {
  # these are so the block lambda works properly
  # (works around what I think is a bug)
  my int $ = $s;
  my int $ = $e;

  start {
    my int $acc = 0;
    loop ( my int $x = $s; $x < $e; $x += 2) {
      next unless (my $s = ~$x) eq $s.flip;
      next unless $x.base(2) eq $x.base(2).flip;
      $acc += $x;
    }
    $total ⚛+= $acc;
  }
}


my int $cores = (Kernel.cpu-cores * 2.2).Int;

my int $per = 1_000_000 div $cores;
++$per if $per * $cores < 1_000_000;

my @promises;

my int $start = 1;
for ^$cores {
  my int $end = $start + $per - 2;
  $end = 1_000_000 if $end > 1_000_000;

  push @promises, process $start, $end;

#say $start, "\t", $end;

  $start = $end + 2;
}

await @promises;
say $total;

Which runs in about 0.63 seconds.
(I messed with the 2.2 value to find a near minimum time on my computer.)

Why is there such a large performance difference between these two scrips that do the same thing?

Tags:

Raku

Related

Recent Posts