Word VBA program reads Emoji character (4 bytes) as "12"

I hope this helps: building upon @SandraRossi's comments above, it seems the input from the emoji panel is not correctly translated to its surrogate code point. If you save a document containing both symbols ( one from the Emoji Panel, and the other from via the menu, as you described) as an XML doc, you notice the difference:

Emoji Input:

<w:r w:rsidR="003814F5">
  <w:rPr>
    <mc:AlternateContent>
      <mc:Choice Requires="w16se">
        <w:rFonts w:ascii="Segoe UI Emoji" w:hAnsi="Segoe UI Emoji"/>
      </mc:Choice>
      <mc:Fallback>
        <w:rFonts w:hint="eastAsia"/>
      </mc:Fallback>
    </mc:AlternateContent>
  </w:rPr>
  <mc:AlternateContent>
    <mc:Choice Requires="w16se">
      <w16se:symEx w16se:font="@SimHei" w16se:char="1F44D"/>
    </mc:Choice>
    <mc:Fallback>
      <w:t>👍</w:t>
    </mc:Fallback>
  </mc:AlternateContent>
</w:r>

Menu (symbol) input:

<w:r w:rsidR="003814F5">
  <w:rPr>
    <w:rFonts w:ascii="Segoe UI Emoji" w:hAnsi="Segoe UI Emoji"/>
  </w:rPr>
  <w:t xml:space="preserve"> is not 👍</w:t>
</w:r>

The line <w16se:symEx w16se:font="@SimHei" w16se:char="1F44D"/> is the key difference here. The normal (Menu -> Insert Symobol) emoji is used as a fallback.

It seems only Word has the issue. I tried the same emoji panel input on Excel (and PowerPoint), and I get the right values in debug ?? which translates to the Unicode code point U+1F44D both in Excel and when copied back to Word.


Here are my final believes and findings.

It's probably a bug in MS Word VBA, based on the tests performed by AAA on Excel, Powerpoint and Word. Some people don't have this bug (cf comments).

The VBA objects give an invalid value for the emoji, but the XML property is correct. The XML is too much complex to be parsed easily, so the simplest workaround was found by Florent B. in the comments, which consists in "recreating the document from itself":

ActiveDocument.Content.InsertXML ActiveDocument.Content.XML

Unfortunately, in my personal case, it may have some collateral effects like shape IDs are renumbered.

So, I extended the code above to only correct the emoji characters in the original document, the rest remains intact, by:

  • copying the XML to a new document,
  • then parsing every character whose text length is > 1 in the new document (i.e. those outside the Unicode Basic Multilingual Plane, containing the Emojis and many other characters too),
  • also parsing the original document (assuming the characters should be in the same order as in the new document and their text lengths are the same),
  • copying those characters from the new document back to the original document,
  • closing the new document.

Okay, the macros runs longer, but I couldn't find a better solution.

Here is my code, simplified (you may be surprised by the useless collection of Range objects, where each Range is one Character object, in fact I don't provide the original code for the function Split_Into_Ranges, which is much bigger but faster, but it works and demonstrates well the solution in the sub correct_emojis):

Sub test()

    Dim text As String
    Dim length As Integer
    Dim arrBytes() As Byte

    Dim zranges As Collection
    Set zranges = Split_Into_Ranges(ActiveDocument)

    Call correct_emojis(zranges) ' <=== here the important algorithm

    text = ActiveDocument.Range.Characters(1).text
    length = Len(ActiveDocument.Range.Characters(1).text)
    arrBytes = ActiveDocument.Range.Characters(1).text

End Sub

Function Split_Into_Ranges(ioDocument As Document) As Collection

    Dim zranges As Collection
    Set zranges = New Collection
    For i = 1 To ioDocument.Characters.Count
        zranges.Add ioDocument.Characters(i)
    Next
    Set Split_Into_Ranges = zranges

End Function

Sub correct_emojis(zranges As Collection)

    Dim current_emoji_zranges As Collection
    Dim temp_zranges As Collection
    Dim temp_emoji_zranges As Collection
    Dim doc_current As Document
    Dim doc_temp As Document
    Dim arrBytes() As Byte

    Set doc_current_zranges = get_emoji_zranges(zranges)
    If doc_current_zranges.Count = 0 Then
        Exit Sub
    End If

    Set doc_current = ActiveDocument
    Set doc_temp = Documents.Add()
    Call doc_temp.Content.InsertXML(doc_current.Content.XML)
    Set temp_zranges = Split_Into_Ranges(doc_temp)

    Set current_emoji_zranges = get_emoji_zranges(zranges)
    Set temp_emoji_zranges = get_emoji_zranges(temp_zranges)

    For i = 1 To current_emoji_zranges.Count
        If 0 = 1 Then
            arrBytes = current_emoji_zranges(i).Characters(1).text
            arrBytes = temp_emoji_zranges(i).Characters(1).text
        End If
        current_emoji_zranges(i).Characters(1).text = temp_emoji_zranges(i).Characters(1).text
    Next

    Call doc_temp.Close(False)

End Sub

Function get_emoji_zranges(zranges As Collection) As Collection

    Dim emoji_zranges As Collection

    Set emoji_zranges = New Collection
    For i = 1 To zranges.Count
        If Len(zranges(i).text) > zranges(i).Characters.Count Then
            For j = 1 To zranges(i).Characters.Count
                If Len(zranges(i).Characters(j).text) > 1 Then
                    emoji_zranges.Add (zranges(i))
                End If
            Next
        End If
    Next

    Set get_emoji_zranges = emoji_zranges

End Function

Tags:

Ms Word

Vba

Emoji