Partitioning a list when the cumulative sum exceeds 1

dat = {0.71, 0.685, 0.16, 0.82, 0.73, 0.44, 0.89, 0.02, 0.47, 0.65};

Module[{t = 0},
  Split[dat, (t += #) <= 1 || (t = 0) &]
{{0.71, 0.685}, {0.16, 0.82, 0.73}, {0.44, 0.89}, {0.02, 0.47, 0.65}}

Credit to Simon Woods for getting me to think about using Or in applications like this.


I decided to make an attempt at a higher performing solution at the cost of elegance and clarity.

f2[dat_List] := Module[{bin, lns},
   bin = 1 - Unitize @ FoldList[If[# <= 1`, #, 0`] & @ +## &, dat]; 
   lns = SparseArray[bin]["AdjacencyLists"] ~Prepend~ 0 // Differences;
      If[# > 0, Append[lns, #], lns] &[Length @ dat - Tr @ lns]

And a second try at performance using Szabolcs's inversion:

f3[dat_List] :=
    bin = 1 - Unitize @ FoldList[If[# <= 1`, #, 0`] & @ +## &, dat];
    bin = Reverse @ Accumulate @ Reverse @ bin;
    dat[[#]] & /@ GatherBy[Range @ Length @ dat, bin[[#]] &]

Using SplitBy seems natural here but it tested slower than GatherBy.

Modified October 2018 to use Carl Woll's GatherByList:

GatherByList[list_, representatives_] := Module[{func},
    func /: Map[func, _] := representatives;
    GatherBy[list, func]

f4[dat_List] :=
    bin = 1 - Unitize @ FoldList[If[# <= 1`, #, 0`] & @ +## &, dat];
    bin = Reverse @ Accumulate @ Reverse @ bin;
    GatherByList[dat, bin]

The other functions to compare:

f1[dat_List] := Module[{t = 0}, Split[dat, (t += #) <= 1 || (t = 0) &]]

fqwerty[dat_List] :=
    f[x_, y_] := Module[{new}, If[Total[new = Append[x, y]] >= 1, Sow[new]; {}, new]];
    Reap[Fold[f, {}, dat]][[2, 1]]

fAlgohi[dat_List] :=
 Module[{i = 0, r},
  Split[dat, (If[r, , i = 0]; i += #; r = i <= 1) &]

And a single point benchmark using "a long list of say 1 million Uniform(0,1) random numbers:"

test = RandomReal[1, 1*^6];

fqwerty[test] // Length // RepeatedTiming
fAlgohi[test] // Length // RepeatedTiming
f1[test]      // Length // RepeatedTiming
f2[test]      // Length // RepeatedTiming
f3[test]      // Length // RepeatedTiming
f4[test]      // Length // RepeatedTiming
main1[test]   // Length // RepeatedTiming    (* from LLlAMnYP's answer *)
{6.54, 368130}

{1.59, 368131}

{1.29, 368131}

{0.474, 368131}

{0.8499, 368131}

{0.4921, 368131}

{0.2622, 368131}

I note that qwerty's solution has one less sublist in the output because he does not include the final trailing elements if they do not exceed one. I do not know which behavior is desired.

Here's my take at making a function as fast as possible.

main = Module[{idxs = sub[Accumulate@#]}, 
    Internal`PartitionRagged[#, idxs]] &;
sub = Compile[{{list, _Real, 1}},
   Block[{i, l = Length[list], ref = 1., bag = Internal`Bag[{0}]},
    For[i = 1, i <= l, i++,
     If[list[[i]] >= ref || i == l, Internal`StuffBag[bag, i]; 
      ref = list[[i]] + 1.;]
    Differences[Internal`BagPart[bag, All]]

It's maybe 5% faster than Mr. Wizards f2 function, but the real bottleneck is PartitionRagged which takes about 80-85% of the time. I suppose, there's not much to gain from compiling, and what's needed, is a fast ragged partition routine. Part is compilable, however Compile does not like to return ragged arrays.

This got me thinking about proper treatment of ragged arrays. While I didn't come up with any proper solution, I did manage to construct a compilable function, that creates a rectangular array with the desired output, but padded to the right with zeros.

main1 = Function[{list}, 
   Block[{sum = Accumulate[list]}, sub2[sub1[sum], list]]];
sub1 = Compile[{{list, _Real, 1}},
   Block[{i, l = Length[list], ref = 1., bag = Internal`Bag[{0}], 
    For[i = 1, i <= l, i++,
     If[list[[i]] >= ref || i == l, Internal`StuffBag[bag, i]; 
      ref = list[[i]] + 1.;]
    idxs = Internal`BagPart[bag, All];
    {Most[idxs] + 1, Rest[idxs], Differences[idxs]} // Transpose
sub2 = Compile[{{idxs, _Integer, 2}, {list, _Real, 1}},
   Block[{result = 
      ConstantArray[0., {Length[idxs], Max[idxs[[All, 3]]]}], i},
    For[i = 1, i <= Length[idxs], i++,
     result[[i, ;; idxs[[i, 3]]]] = 
      list[[idxs[[i, 1]] ;; idxs[[i, 2]]]]];

This is about 30% faster than the previous result. One might assume, that if we're talking about running totals, more often than not we're looking at non-negative numbers, so padding with zeros (or maybe a large negative number) will not lead to ambiguity.

Internal`PartitionRagged uses Accumulate internally to generate a list of positions from the sub-list lengths, then MapThread and Take to extract the corresponding elements from the array. You can check the internal definition with


The reason for pointing this out is that answers which generate a list of positions and then use Differences to convert to sub-list lengths (for input to Internal`PartitionRagged) can be marginally improved by skipping the Differences and Accumulate steps.

Here's a modified version of LLlAMnYP's code. The compiled function outputs two lists, these are the start and end points for each sub-list in the original array. The main function just MapThreads Take over these lists.

fc = Compile[{{data, _Real, 1}},
   Block[{a = Accumulate[data], i, n = Length[data], ref = 1.0, bag = Internal`Bag[{0}]},
    Do[If[a[[i]] >= ref, Internal`StuffBag[bag, i]; ref = a[[i]] + 1.0], {i, n - 1}];
    Internal`StuffBag[bag, n];
    {1 + Most@Internal`BagPart[bag, All], Rest@Internal`BagPart[bag, All]}]];

f0[data_] := Module[{p = fc@data}, MapThread[Take[data, {#1, #2}] &, p]]

In my tests this comes out a few percent faster than LLlAMnYP's. As with all the answers, the bottleneck is the unpacking of the original packed data into a ragged list.