Safely remove items from an array table while iterating

the general case of iterating over an array and removing random items from the middle while continuing to iterate

If you're iterating front-to-back, when you remove element N, the next element in your iteration (N+1) gets shifted down into that position. If you increment your iteration variable (as ipairs does), you'll skip that element. There are two ways we can deal with this.

Using this sample data:

    input = { 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p' }
    remove = { f=true, g=true, j=true, n=true, o=true, p=true }

We can remove input elements during iteration by:

  1. Iterating from back to front.

    for i=#input,1,-1 do
        if remove[input[i]] then
            table.remove(input, i)
        end
    end
    
  2. Controlling the loop variable manually, so we can skip incrementing it when removing an element:

    local i=1
    while i <= #input do
        if remove[input[i]] then
            table.remove(input, i)
        else
            i = i + 1
        end
    end
    

For non-array tables, you iterate using next or pairs (which is implemented in terms of next) and set items you want removed to nil.

Note that table.remove shifts all following elements every time it's called, so performance is exponential for N removals. If you're removing a lot of elements, you should shift the items yourself as in LHF or Mitch's answer.


Efficiency!

WARNING: Do NOT use table.remove(). That function causes all of the subsequent (following) array indices to be re-indexed every time you call it to remove an array entry. It is therefore MUCH faster to just "compact/re-index" the table in a SINGLE passthrough OURSELVES instead!

The best technique is simple: Count upwards (i) through all array entries, while keeping track of the position we should put the next "kept" value into (j). Anything that's not kept (or which is moved from i to j) is set to nil which tells Lua that we've erased that value.

I'm sharing this, since I really don't like the other answers on this page (as of Oct 2018). They're either wrong, bug-ridden, overly simplistic or overly complicated, and most are ultra-slow. So I implemented an efficient, clean, super-fast one-pass algorithm instead. With a SINGLE loop.

Here's a fully commented example (there's a shorter, non-tutorial version at the end of this post):

function ArrayShow(t)
    for i=1,#t do
        print('total:'..#t, 'i:'..i, 'v:'..t[i]);
    end
end

function ArrayRemove(t, fnKeep)
    print('before:');
    ArrayShow(t);
    print('---');
    local j, n = 1, #t;
    for i=1,n do
        print('i:'..i, 'j:'..j);
        if (fnKeep(t, i, j)) then
            if (i ~= j) then
                print('keeping:'..i, 'moving to:'..j);
                -- Keep i's value, move it to j's pos.
                t[j] = t[i];
                t[i] = nil;
            else
                -- Keep i's value, already at j's pos.
                print('keeping:'..i, 'already at:'..j);
            end
            j = j + 1;
        else
            t[i] = nil;
        end
    end
    print('---');
    print('after:');
    ArrayShow(t);
    return t;
end

local t = {
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'
};

ArrayRemove(t, function(t, i, j)
    -- Return true to keep the value, or false to discard it.
    local v = t[i];
    return (v == 'a' or v == 'b' or v == 'f' or v == 'h');
end);

Output, showing its logic along the way, how it's moving things around, etc...

before:
total:9 i:1 v:a
total:9 i:2 v:b
total:9 i:3 v:c
total:9 i:4 v:d
total:9 i:5 v:e
total:9 i:6 v:f
total:9 i:7 v:g
total:9 i:8 v:h
total:9 i:9 v:i
---
i:1 j:1
keeping:1   already at:1
i:2 j:2
keeping:2   already at:2
i:3 j:3
i:4 j:3
i:5 j:3
i:6 j:3
keeping:6   moving to:3
i:7 j:4
i:8 j:4
keeping:8   moving to:4
i:9 j:5
---
after:
total:4 i:1 v:a
total:4 i:2 v:b
total:4 i:3 v:f
total:4 i:4 v:h

Finally, here's the function for use in your own code, without all of the tutorial-printing... and with just a few minimal comments to explain the final algorithm:

function ArrayRemove(t, fnKeep)
    local j, n = 1, #t;

    for i=1,n do
        if (fnKeep(t, i, j)) then
            -- Move i's kept value to j's position, if it's not already there.
            if (i ~= j) then
                t[j] = t[i];
                t[i] = nil;
            end
            j = j + 1; -- Increment position of where we'll place the next kept value.
        else
            t[i] = nil;
        end
    end

    return t;
end

That's it!

And if you don't want to use the whole "re-usable callback/function" design, you can simply copy the inner code of ArrayRemove() into your project, and change the line if (fnKeep(t, i, j)) then to if (t[i] == 'deleteme') then... That way you get rid of the function call/callback overhead too, and speed things up even more!

Personally, I use the re-usable callback system, since it still massively beats table.remove() by factors of 100-1000+ times faster.

Bonus (Advanced Users): Regular users can skip reading this bonus section. It describes how to sync multiple related tables. Note that the 3rd parameter to fnKeep(t, i, j), the j, is a bonus parameter which allows your keep-function to know what index the value will be stored at whenever fnKeep answers true (to keep that value).

Example usage: Let's say you have two "linked" tables, where one is table['Mitch'] = 1; table['Rick'] = 2; (a hash-table for quick array index lookups via named strings) and the other is array[{Mitch Data...}, {Rick Data...}] (an array with numerical indices, where Mitch's data is at pos 1 and Rick's data is at pos 2, exactly as described in the hash-table). Now you decide to loop through the array and remove Mitch Data, which thereby moves Rick Data from position 2 to position 1 instead...

Your fnKeep(t, i, j) function can then easily use the j info to update the hash-table pointers to ensure they always point at the correct array offsets:

local hData = {['Mitch'] = 1, ['Rick'] = 2};
local aData = {
    {['name'] = 'Mitch', ['age'] = 33}, -- [1]
    {['name'] = 'Rick', ['age'] = 45}, -- [2]
};

ArrayRemove(aData, function(t, i, j)
    local v = t[i];
    if (v['name'] == 'Rick') then -- Keep "Rick".
        if (i ~= j) then -- i and j differing means its data offset will be moved if kept.
            hData[v['name']] = j; -- Point Rick's hash table entry at its new array location.
        end
        return true; -- Keep.
    else
        hData[v['name']] = nil; -- Delete this name from the lookup hash-table.
        return false; -- Remove from array.
    end
end);

Thereby removing 'Mitch' from both the lookup hash-table and the array, and moving the 'Rick' hash-table entry to point to 1 (that's the value of j) where its array data is being moved to (since i and j differed, meaning the data was being moved).

This kind of algorithm allows your related tables to stay in perfect sync, always pointing at the correct data position thanks to the j parameter.

It's just an advanced bonus for those who need that feature. Most people can simply ignore the j parameter in their fnKeep() functions!

Well, that's all, folks!

Enjoy! :-)

Benchmarks (aka "Let's have a good laugh...")

I decided to benchmark this algorithm against the standard "loop backwards and use table.remove()" method which 99.9% of all Lua users are using.

To do this test, I used the following test.lua file: https://pastebin.com/aCAdNXVh

Each algorithm being tested is given 10 test-arrays, containing 2 million items per array (a total of 20 million items per algorithm-test). The items in all arrays are identical (to ensure total fairness in testing): Every 5th item is the number "13" (which will be deleted), and all other items are the number "100" (which will be kept).

Well... my ArrayRemove() algorithm's test concluded in 2.8 seconds (to process the 20 million items). I'm now waiting for the table.remove() test to finish... It's been a few minutes so far and I am getting bored........ Update: Still waiting... Update: I am hungry... Update: Hello... today?! Update: Zzz... Update: Still waiting... Update: ............ Update: Okay, the table.remove() code (which is the method that most Lua users are using) is going to take a few days. I'll update the day it finishes.

Note to self: I began running the test at ~04:55 GMT on November 1st, 2018. My ArrayRemove() algorithm finished in 2.8 seconds... The built-in Lua table.remove() algorithm is still running as of now... I'll update this post later... ;-)

Update: It is now 14:55 GMT on November 1st, 2018, and the table.remove() algorithm has STILL NOT FINISHED. I'm going to abort that part of the test, because Lua has been using 100% of my CPU for the past 10 hours, and I need my computer now. And it's hot enough to make coffee on the laptop's aluminum case...

Here's the result:

  • Processing 10 arrays with 2 million items (20 million items total):
  • My ArrayRemove() function: 2.8 seconds.
  • Normal Lua table.remove(): I decided to quit the test after 10 hours of 100% CPU usage by Lua. Because I need to use my laptop now! ;-)

Here's the stack trace when I pressed Ctrl-C... which confirms what Lua function my CPU has been working on for the last 10 hours, haha:

[     mitch] elapsed time: 2.802

^Clua: test.lua:4: interrupted!
stack traceback:
    [C]: in function 'table.remove'
    test.lua:4: in function 'test_tableremove'
    test.lua:43: in function 'time_func'
    test.lua:50: in main chunk
    [C]: in ?

If I had let the table.remove() test run to its completion, it may take a few days... Anyone who doesn't mind wasting a ton of electricity is welcome to re-run this test (file is above at pastebin) and let us all know how long it took.

Why is table.remove() so insanely slow? Simply because every call to that function has to repeatedly re-index every table item that exists after the one we told it to remove! So to delete the 1st item in a 2 million item array, it must move the indices of ALL other 2 million items down by 1 slot to fill the gap caused by the deletion. And then... when you remove another item.. it has to yet again move ALL other 2 million items... It does this over and over...

You should never, EVER use table.remove()! Its performance penalty grows rapidly. Here's an example with smaller array sizes, to demonstrate this:

  • 10 arrays of 1,000 items (10k items total): ArrayRemove(): 0.001 seconds, table.remove(): 0.018 seconds (18x slower).
  • 10 arrays of 10,000 items (100k items total): ArrayRemove(): 0.014 seconds, table.remove(): 1.573 seconds (112.4x slower).
  • 10 arrays of 100,000 items (1m items total): ArrayRemove(): 0.142 seconds, table.remove(): 3 minutes, 48 seconds (1605.6x slower).
  • 10 arrays of 2,000,000 items (20m items total): ArrayRemove(): 2.802 seconds, table.remove(): I decided to abort the test after 10 hours, so we may never now how long it takes. ;-) But at the current timepoint (not even finished), it's taken 12847.9x longer than ArrayRemove()... But the final table.remove() result, if I had let it finish, would probably be around 30-40 thousand times slower.

As you can see, table.remove()'s growth in time is not linear (because if it was, then our 1 million item test would have only taken 10x as long as the 0.1 million (100k) test, but instead we see 1.573s vs 3m48s!). So we cannot take a lower test (such as 10k items) and simply multiply it to 10 million items to know how long the test that I aborted would have taken... So if anyone is truly curious about the final result, you'll have to run the test yourselves and post a comment after a few days when table.remove() finishes...

But what we can do at this point, with the benchmarks we have so far, is say these words: F-ck table.remove()! ;-)

There's no reason to ever call that function. EVER. Because if you want to delete items from a table, just use t['something'] = nil;. If you want to delete items from an array (a table with numeric indices), use ArrayRemove().

By the way, the tests above were all executed using Lua 5.3.4, since that's the standard runtime most people use. I decided to do a quick run of the main "20 million items" test using LuaJIT 2.0.5 (JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse), which is a faster runtime than the standard Lua. The result for 20 million items with ArrayRemove() was: 2.802 seconds in Lua, and 0.092 seconds in LuaJIT. Which means that if your code/project runs on LuaJIT, you can expect even faster performance from my algorithm! :-)

I also re-ran the "100k items" test one final time using LuaJIT, so that we can see how table.remove() performs in LuaJIT instead, and to see if it's any better than regular Lua:

  • [LUAJIT] 10 arrays of 100,000 items (1m items total): ArrayRemove(): 0.005 seconds, table.remove(): 20.783 seconds (4156.6x slower than ArrayRemove()... but this LuaJIT result is actually a WORSE ratio than regular Lua, whose table.remove() was "only" 1605.6x slower than my algorithm for the same test... So if you're using LuaJIT, the performance ratio is even more in favor of my algorithm!)

Lastly, you may wonder "would table.remove() be faster if we only want to delete one item, since it's a native function?". If you use LuaJIT, the answer to that question is: No. In LuaJIT, ArrayRemove() is faster than table.remove() even for removing ONE ITEM. And who isn't using LuaJIT? With LuaJIT, all Lua code speeds up by easily around 30x compared to regular Lua. Here's the result: [mitch] elapsed time (deleting 1 items): 0.008, [table.remove] elapsed time (deleting 1 items): 0.011. Here's the pastebin for the "just delete 1-6 items" test: https://pastebin.com/wfM7cXtU (with full test results listed at the end of the file).

TL;DR: Don't use table.remove() anywhere, for any reason whatsoever!

Hope you enjoy ArrayRemove()... and have fun, everyone! :-)

Tags:

Lua