Unicode char {U+200B}

U+200B is zero width space which you have here:

elim‌​}

from the end of this line

\DeclareMultiCiteCommand{\parencites}[\mkbibparens]{\parencite}{\multicitedelim‌​}

is (using this unicode converter)

  U+0065 LATIN SMALL LETTER E     e
  U+006c LATIN SMALL LETTER L     l
  U+0069 LATIN SMALL LETTER I     i
  U+006d LATIN SMALL LETTER M     m
  U+200c ZERO WIDTH NON-JOINER     ‌
  U+200b ZERO WIDTH SPACE     ​ 
  U+007d RIGHT CURLY BRACKET     } } \rbrace

delete that line and re-type without the invisible control characters between the m and }


This is one of the top hits for U+200B and LaTeX, so I’ll post solutions here.

Take the following example:

\tracinglostchars=2
\documentclass{article}
\pagestyle{empty}
\begin{document}f​l f‌l fl\end{document}

In LuaLaTeX, it compiles to:

Latin Modern font sample

The first fl has no ligature because I inserted U+200B, a zero-width space. The second has no ligature because I inserted U+200C, a zero-width non-joiner. These might have been in the original source you copied from intentionally: a zero-width space could mean a potential line break, such as after a slash, and a zero-width non-joiner disables a ligature. For example, the fi in Elf‌in or the fl in Half‌ling are (according to pedants like me) not supposed to be ligated, since they belong to different pieces of a compound word. Almost no one bothers to actually do that, but it is much more common in some other languages.

If you try to compile it in PDFLaTeX, you will get the error message that brought you here:

! Package inputenc Error: Unicode character ​ (U+200B)
(inputenc)                not set up for use with LaTeX.

There are several ways to fix it.

Clean Your Source by Hand

This is what most people on this site recommend. Your editor might have a way to display special characters so you can delete them all. But really, isn’t this a job for a computer?

Clean your Source in your Editor

This is harder with an invisible, zero-width character, but you might be able to copy zero-width space from a character map, open up your search-and-replace dialogue box and paste the character into the search field. You can then replace it with something like ZWS or {\hskip 0pt}.

Clean Your Source with Perl

The following one-line Perl script will create a new source file with all zero-width spaces removed:

perl -CSD -pe "s/\N{U+200B}//gu" < U200B.tex > noU200B.tex

If it’s easier to remember, you could also write this as

perl -CSD -pe "s/\N{ZERO WIDTH SPACE}//gu" < U200B.tex > noU200B.tex

The -CSD option selects UTF-8 unconditionally, even if you don’t have UTF-8 as your default locale. The -pe option runs the given Perl script on the input file and prints to the output file. The s command does substitution, the \N{...} is a regular expression matching zero-width space, the empty field between // means replace with nothing, and gu means replace all instances globally in the unicode string. Then, the < and > operators select the input and output files.

Either of these produce a file that compiles to:

Computer Modern Font sample

It’s also possible to automatically remove all characters outside a given subset. The script

perl -CSD -pe "s/[^\p{Word}\p{Punct}\p{Symbol}\p{Mark}\p{PerlSpace}]//gu"

allows only the following: Unicode “word” characters, punctuation, symbols, accents and a few kinds of spaces. It erases most invisible characters. An even more restrictive version would be

perl -CSD -pe "s/[^\p{ASCII}]//gu"

This cleans out all characters but for the ASCII originally allowed in TeX (including instead of double backtick).

And yes, we could replace zero-width space by something instead of nothing. The script

perl -CSD -pe "s/\N{ZERO WIDTH SPACE}/{\\\\hskip 0pt}/gu; s/\N{ZERO WIDTH NON-JOINER}/{}/gu"

given the above MWE as input, produces the following output:

\tracinglostchars=2
\documentclass{article}
\pagestyle{empty}
\begin{document}f{\hskip 0pt}l f{}l fl\end{document}

Teach LaTeX to Understand Zero-Width Space

If the problem is that U+200B is “not set up for use with LaTeX,” but it’s equivalent to a TeX command—\hskip 0pt or \hspace{0pt} are zero-width spaces that prevent ligatures and enable a potential line break—we can set the character up to use that command.

\tracinglostchars=2
\documentclass{article}
\usepackage{iftex}
\pagestyle{empty}

\ifTUTeX
  \usepackage{fontspec}
\else
  \usepackage[T1]{fontenc}
  \usepackage[utf8]{inputenc} % The default since 2018
  \DeclareUnicodeCharacter{200B}{{\hskip 0pt}}
\fi

\begin{document}f​l f‌l fl\end{document}

Although the \DeclareUnicodeCharacter command is in inputenc, the LaTeX kernel has loaded it by default since 2018. So, we could have skipped declaring it.

Tags:

Unicode