Speeding up the built-in Rudin-Shapiro and Thue-Morse sequence functions

I can't take much credit for this answer--I hadn't even got version 10.2 installed until J. M. commented to me that these functions could be written efficiently in terms of the Hamming weight function. But, it is understandable that he doesn't want to write an answer using a smartphone.

The definition of the built-in ThueMorse is:

ThueMorse[n_Integer] := Mod[DigitCount[n, 2, 1], 2]

And, sure enough, the performance of DigitCount used in this way is exactly what Mr. Wizard complained about previously. Let's re-define it to use the hammingWeight LibraryLink function given in the linked answer:

hammingWeightC = LibraryFunctionLoad[
  "hammingWeight.dll",
  "hammingWeight_T_I", {{Integer, 1, "Constant"}}, {Integer, 0, Automatic}
  ];
hammingWeight[num_Integer] := hammingWeightC@IntegerDigits[num, 2^62];

thueMorse[n_Integer] := Mod[hammingWeight[n], 2]

The performance is considerably improved, at least for large numbers:

test = 10^(10^5);
ThueMorse[test] // RepeatedTiming (* -> 0.00184271 seconds *)
thueMorse[test] // RepeatedTiming (* -> 0.0000230804 seconds *)

That's 80 times faster.

The definition of the built-in RudinShapiro is:

RudinShapiro[n_Integer] := (-1)^StringCount[IntegerString[n, 2], "11", Overlaps -> True]

It is a little strange, in my opinion, to implement the function quite so literally. It can also be written in terms of ThueMorse as:

rudinShapiro[n_Integer] := 1 - 2 ThueMorse[BitAnd[n, Quotient[n, 2]]]

Where the ThueMorse used here is still the built-in version. Its performance is fairly improved as a result of this rewriting (which, again, I do not take any credit for):

test = 10^(10^5);
RudinShapiro[test] // RepeatedTiming (* -> 0.0131471 seconds *)
rudinShapiro[test] // RepeatedTiming (* -> 0.00195247 seconds *)

So, it's almost 7 times faster just by avoiding the use of string functions. What about the case if we also use the improved thueMorse?

rudinShapiro2[n_Integer] := 1 - 2 thueMorse[BitAnd[n, Quotient[n, 2]]]

test = 10^(10^5);
rudinShapiro2[test] // RepeatedTiming (* -> 0.0000490027 seconds *)

This revision is 270 times faster than the built-in version. If its timing only depended on thueMorse, it would be about $7 \times 80 = 560$ times faster. The fact that it comes within a factor of 2 of this limit suggests that BitAnd and Quotient are quite efficient.

But, Quotient still isn't quite as efficient as a bit shift (again, not my idea):

rudinShapiro3[n_Integer] := 1 - 2 thueMorse[BitAnd[n, BitShiftRight[n]]];

test = 10^(10^5);
rudinShapiro3[test] // RepeatedTiming (* -> 0.0000314508 seconds *)

Now it's 420 times faster than the built-in.

Here I will just take the opportunity to remind anyone from WRI who may read this that all of the code in my answers is offered under academic-style permissive licences (your choice of three). So, it should be possible to speed up these functions in the product with minimal effort and without need to write any new code.

With modest preprocessing we get a factor of 9 or so for large inputs just by chunking into 12 bit pieces and using a compiled lookup function.

m = 12;
Timing[tmLookup = 
   Table[Mod[Total[IntegerDigits[j, 2]], 2], {j, 0, 2^m - 1}];]

(* Out[49]= {0.00157, Null} *)

Some of the option settings are probably overkill.

tmLookupCSmall = With[{tmtable = tmLookup}, Compile[
    {{bigits, _Integer, 1}},
    Mod[Total[Map[tmtable[[# + 1]] &, bigits]], 2], 
    RuntimeAttributes -> Listable, CompilationTarget -> "C", 
    RuntimeOptions -> "Speed"]];

Here is the large test from a prior response.

In[45]:= test = 10^(10^5);

In[46]:= Timing[Do[ThueMorse[test], {1000}];]

(* Out[46]= {0.912775, Null} *)

In[51]:= Timing[Do[tmLookupCSmall[IntegerDigits[test, 2^m]], {1000}];]

(* Out[51]= {0.11619, Null} *)

In[52]:= tm1 === tm2

(* Out[52]= True *)

There are two bottlenecks. One is that we cannot really take the chunk size much higher since we have to store 2^m values for chunks of m bits. Another is that the compiling to C is probably well below optimal in terms of run time efficiency. I have not tried using larger sizes and reimplementing in bitwise operations so possibly that could give a further boost.

Another approach is to define the sequence recursively:

t[0] = 0;
t[n_] := t[n] = If[EvenQ[n], t[n/2], 1 - t[(n-1)/2]];

To check:

t[#] & /@ Range[100] == ThueMorse[Range[100]]
True

For speed:

ThueMorse[Range[20000]]; // AbsoluteTiming
{0.45703, Null}

t[#] & /@ Range[20000]; // AbsoluteTiming
{0.095282, Null}

Attempting to fine tune this leads to only small improvements:

tb[0] = 0;
tb[n_] := tb[n] = (tN = tb[BitShiftRight[n, 1]]; If[EvenQ[n], tN, 1 - tN]);

Sorry for being so late to the ThueMorse party!

Speeding up the built-in Rudin-Shapiro and Thue-Morse sequence functions

Tags:

Number Theory

Performance Tuning

Integer Sequence

Related

Recent Posts