What is the fastest way to turn every member of an array alphanumeric?

tl;dr - Regular expressions destroy VBA implementations. If this were a code challenge, @brettj or @Slai should win it.

There are a bunch of tricks to make your AlphaNumericOnly faster.

First, you can get rid of the vast majority of the function calls by treating it as a byte array instead of a string. That removes all of the calls to Mid$ and Asc. Although these are incredibly fast functions, they still add the overhead pushing onto and popping off of the call stack. That adds up over a couple hundred thousand iterations.

The second optimization is to not use Case x To y syntax if you can avoid it. The reason has to do with how it compiles - it doesn't compile to a test like Case = Condition >= x And Condition <= y, it actually creates a loop with an early exit condition like this:

Case = False
For i = x To y
    If Condition = i Then
        Case = True
    End If
Next

Again, not a huge performance hit, but it adds up. The third optimization is to order your tests in a way that makes them sort circuit on the most likely hits in your data set. I tailored my examples below for primarily letters, with most of them upper case. You may do better with different ordering. Put it all together and you get something that looks like this:

Public Function ByteAlphaNumeric(source As Variant) As String
    Dim chars() As Byte
    Dim outVal() As Byte
    chars = CStr(source)        'Load the array up.

    Dim bound As Long
    bound = UBound(chars)       'Size the outbound array.
    ReDim outVal(bound)

    Dim i As Long, pos As Long
    For i = 0 To bound Step 2   'Wide characters, only care about the ASCII range.
        Dim temp As Byte
        temp = chars(i)         'Pointer math isn't free. Cache it.
        Select Case True        'Order is important here.
            Case temp > 64 And temp < 91
                outVal(pos) = temp
                pos = pos + 2   'Advance the output pointer.
            Case temp < 48
            Case temp > 122
            Case temp > 96
                outVal(pos) = temp
                pos = pos + 2
            Case temp < 58
                outVal(pos) = temp
                pos = pos + 2
        End Select
    Next
    'This is likely the most expensive operation.
    ReDim Preserve outVal(pos)  'Trim the output array.
    ByteAlphaNumeric = outVal
End Function

How does it do? Pretty well:

Public Sub Benchmark()
    Dim starting As Single, i As Long, dummy As String, sample As Variant

    sample = GetRandomString

    starting = Timer
    For i = 1 To 1000000
        dummy = AlphaNumericOnlyOP(sample)
    Next i
    Debug.Print "OP's AlphaNumericOnly: ", Timer - starting

    starting = Timer
    For i = 1 To 1000000
        dummy = AlphaNumericOnlyThunderframe(sample)
    Next i
    Debug.Print "ThunderFrame's AlphaNumericOnly: ", Timer - starting

    starting = Timer
    For i = 1 To 1000000
        dummy = AlphaNumeric(sample)
    Next i
    Debug.Print "CallumDA33's AlphaNumeric: ", Timer - starting

    starting = Timer
    For i = 1 To 1000000
        dummy = ByteAlphaNumeric(sample)
    Next i
    Debug.Print "ByteAlphaNumeric: ", Timer - starting

    Dim cast As String
    cast = CStr(sample)
    starting = Timer
    For i = 1 To 1000000
        dummy = ByteAlphaNumericString(cast)
    Next i
    Debug.Print "ByteAlphaNumericString: ", Timer - starting

    Set stripper = Nothing
    starting = Timer
    For i = 1 To 1000000
        dummy = OptimizedRegex(sample)
    Next i
    Debug.Print "OptimizedRegex: ", Timer - starting

End Sub

Private Function GetRandomString() As Variant
    Dim chars(30) As Byte, i As Long
    Randomize
    For i = 0 To 30 Step 2
        chars(i) = Int(96 * Rnd + 32)
    Next i
    Dim temp As String
    temp = chars
    GetRandomString = CVar(temp)
End Function

Results with a 15 character random String:

OP`s AlphaNumericOnly:                     6.565918 
ThunderFrame`s AlphaNumericOnly:           3.617188 
CallumDA33`s AlphaNumeric:                23.518070 
ByteAlphaNumeric:                          2.354980

Note, I omitted submissions that weren't trivial to convert to functions. You may notice 2 additional test - the ByteAlphaNumericString is exactly the same as the ByteAlphaNumeric function, but it takes a String as input instead of a Variant and gets rid of the cast. That's not trivial:

ByteAlphaNumericString:                    2.226074

And finally, the elusive OptimizedRegex function (basically @brettj's code in function form for comparison timing):

Private stripper As RegExp  'Module level

Function OptimizedRegex(strSource As Variant) As String
    If stripper Is Nothing Then
        Set stripper = New RegExp
        With stripper
            .Global = True
            .Pattern = "[^0-9A-Za-z]"
        End With
    End If
    OptimizedRegex = stripper.Replace(strSource, vbNullString)
End Function
OptimizedRegex:                            1.094727 

EDIT: Bonus implementation!

It occurred to me that a hash table lookup might be faster than a Select Case structure, so I built one with using a Scripting.Dictionary:

Private hash As Scripting.Dictionary  'Module level

Function HashLookups(source As Variant) As String
    Dim chars() As Byte
    Dim outVal() As Byte

    chars = CStr(source)
    Dim bound As Long
    bound = UBound(chars)
    ReDim outVal(bound)

    Dim i As Long, pos As Long
    With hash
        For i = 0 To bound Step 2
            Dim temp As Byte
            temp = chars(i)
            If .Exists(temp) Then
                outVal(pos) = temp
                pos = pos + 2
            End If
        Next
    End With
    ReDim Preserve outVal(pos)
    HashLookups = outVal
End Function

Private Sub LoadHashTable()
    Set hash = New Scripting.Dictionary
    Dim i As Long
    For i = 48 To 57
        hash.Add i, vbNull
    Next
    For i = 65 To 90
        hash.Add i, vbNull
    Next
    For i = 97 To 122
        hash.Add i, vbNull
    Next
End Sub

'Test code:
    starting = Timer
    LoadHashTable
    For i = 1 To 1000000
        dummy = HashLookups(sample)
    Next i
    Debug.Print "HashLookups: ", Timer - starting

It turned out to be not too shabby:

HashLookups:                               1.655273

Final Version

Woke up and thought I'd try a vector lookup instead of a hash lookup (just fill a byte array of values to keep and use that for tests). This seems reasonable in that it's only a 256 element array - basically a truth table:

Private lookup(255) As Boolean 'Module level

Function VectorLookup(source As Variant) As String
    Dim chars() As Byte
    Dim outVal() As Byte

    chars = CStr(source)
    Dim bound As Long
    bound = UBound(chars)
    ReDim outVal(bound)

    Dim i As Long, pos As Long
    For i = 0 To bound Step 2
        Dim temp As Byte
        temp = chars(i)
        If lookup(temp) Then
            outVal(pos) = temp
            pos = pos + 2
        End If
    Next
    ReDim Preserve outVal(pos)
    VectorLookup = outVal
End Function

Private Sub GenerateTable()
    Dim i As Long
    For i = 48 To 57
        lookup(i) = True
    Next
    For i = 65 To 90
        lookup(i) = True
    Next
    For i = 97 To 122
        lookup(i) = True
    Next
End Sub

Assuming that the lookup table is only generated once, it's clocking in somewhere around 10-15% faster than any other pure VBA method above.


Not sure if this would be faster because it depends on too many factors, but might be worth testing. Instead of Regex.Replace each value separately, you can get the copied Range text from the clipboard and replace all values at once. Note that \w matches underscore and Unicode letters too, so being more specific in the regular expression can make it faster.

'[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testing

Dim r As Range, s As String
Set r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000
With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
   r.Copy
   .GetFromClipboard
    Application.CutCopyMode = False
    s = .GetText
    .Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text"

    With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp")
        .Global = True
        '.IgnoreCase = False ' .IgnoreCase is False by default
        .Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters
        s = .Replace(s, vbNullString)
    End With

    .SetText s
    .PutInClipboard
End With

' about 70% of the time is spent here in pasting the data 
r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1

'Debug.Print Timer - t

I expect this to be slower for less values because of the clipboard overhead, and maybe slower for a lot more values because of the memory needed.

Disabling events didn't seem to make difference in my tests, but might be worth trying.

Note that there is a tiny chance of another application using the clipboard while the macro is using it.

If early binding causes issues from running the same compiled macro on different machines, you can search for macro decompiler or remove the references and switch to late binding.