Are there benefits to use XeTeX or LuaTeX if one is to write documents mainly in English?

Some of the advantages include:

  • Being able to use unicode-math and copy, paste and search for math symbols
  • Not being limited to sixteen math alphabets
  • Not having to juggle 8-bit, or even 7-bit, text encodings
  • Being able to type symbols into your source code and have them work without a lot of set-up to declare them active
  • You can use any font on your machine without a complicated conversion to Type 1 format
  • Certain LaTeX3 interfaces only function properly if the engine supports Unicode natively
  • Even in English, you will often use non-ASCII characters, such as opening and curling quotes, dashes, ligatures and the occasional accent. You could theoretically make these copyable and searchable in PDFLaTeX with the mmap or cmap package. But I never see anyone do that, and I’ve frequently seen papers with typos like “di cult” because someone used a font with no ffi ligature.
  • You can use the extensions of the engines, such as XDV output (useful for document conversion) and Lua scripting.

A major application of this is accessibility. If a reader can identify a symbol, it can pronounce it for a visually-impaired user, as well as being able to convert it to another format.


To answer to how get rid the U+200B character using pdflatex:

The 'ZERO WIDTH SPACE' (U+200B) as the name suggest, is a space without space, but you can note that the character is there because you need press the cursor key twice to pass to the next/previous character.

This causes problems because pdflatex does not know what to do with that, unlike xelatex and lualatex.

To clean it you can use any text tool able to and search and replace this character in all the docuemnt. Only as example, Texworks or Gummi in Linux allow type the character with:

Ctrl+Shift+u200BEnter

Then, you can copy and paste in the search tool and replace with nothing some other character to see where it was. If you have problem with this, other solution is tell to pdflatex what to do. Consider this example:

\documentclass{article}
\usepackage{xcolor}
\DeclareUnicodeCharacter{200B}{ \colorbox{yellow}{\sffamily\bfseries u+200B}
\typeout{}\typeout{WARNING: Bad character U+200B in the line \the\inputlineno}\typeout{}}
\begin{document}
a​b

cd

e​f

asasa
\end{document}

This will show these warnings in the log file:

WARNING: Bad character U+200B in the line 6

WARNING: Bad character U+200B in the line 10

And the PDF will show also where they are:

mwe

But probably is better leave it as it really is, and forget it:

\DeclareUnicodeCharacter{200B}{\hspace{0pt}}