solve "Unicode char is not set up for use with LaTeX" without special handling of every new interesting UTF-8 character

You can think about it this way: for LaTeX (or any system) to draw “⌀” (or any character) onto the page/screen, the “shape” (visual appearance) of that character must be available to it somewhere.

Outside the *TeX world

In most “modern” applications, text editors, browsers, and so on, this is implemented via a font-fallback mechanism:

  1. the application (say your browser) first tries to find a glyph for that character in the current font. (As I type these letters on the tex.stackexchange.com website, they are being displayed in the “Lucida Grande” font, specified by the CSS on this site.)
  2. if the current font does not support the character, the application tries to find some font installed on the system that contains a glyph for that character, then picks up the glyph from that font. (In this case, with the fonts installed on my computer, my browser shows the “⌀” character from the “Cambria” font, as the character is not present in the “Lucida Grande” font.) enter image description here

  3. If there is no such font, then you may see a square box, a question mark, a blank space, or some such depiction that the character is missing (possibly from a “Last Resort” fallback font). For example, in the following texts showing the digits 0 to 9, for the odd digits I use the digits first from the “Brahmi” Unicode block (for which I have a font installed) and then from the “Masaram Gondi” Unicode block (added to Unicode in version 10.0.0, released only a few days ago in June 2017), for which I (and probably you too) don't have a font installed:

    02468 02468

    I see it rendered as in the image below; what do you see?

    Brahmi Gondi

In the *TeX world

The above 1–2–3 description of the character rendering process does not hold in the TeX world, for two reasons:

  • [Solved with XeTeX/LuaTeX] Partly because the roots of (La)TeX are very old, it is not straightforward to load a modern OpenType font from your system into a non-Unicode LaTeX engine (e.g. pdflatex). So even if you knew a font that contains the character you want, it is not straightforward in the non-Unicode engine to say “show the ⌀ character from the Cambria font”. Instead you can draw on a large number of (La)TeX-specific fonts, symbols available in many packages (see The Comprehensive LaTeX Symbol List), and so on.

    • This is solved with XeTeX/LuaTeX: they are Unicode-aware by default, can use standard (Opentype) system fonts, etc. See example later below.
  • [Not solved] Partly because of the goal of careful and precise typesetting, TeX-based engines so far don't automatically load random fonts: you have to specify the fonts you want. (And even if you know which fonts you want, there's no straightforward way to specify fallback logic.) Personally I think this wouldn't be such a terrible idea to implement: for example the engine could stop, show you all the font options, and ask you which one you wanted. Instead, it does the ridiculous thing of silently printing a warning message, which by default is even buried in the log file.

Summary

Ways to get specific Unicode characters in LaTeX:

  1. With pdflatex, load a particular package that can produce the shape you want (or even draw it yourself, or load an image), then write definitions for specific characters:

    \documentclass{article}
    \usepackage[utf8]{inputenc} % To be able to write ⌀ in the first place
    \usepackage{fdsymbol}       % Contains a symbol for ⌀ -- use \diameter
    \usepackage{newunicodechar} % To write the definition on the next line
    \newunicodechar{⌀}{\ensuremath{\diameter}}
    \begin{document}
    ⌀
    \end{document}
    

    Note that if you load a different package that defines \diameter differently, you will get a different visual appearance.

  2. With xelatex/lualatex, (use the above options or) use a font that contains that Unicode character. For ⌀ in particular, the default font (LMRoman10) actually already has it:

    \documentclass{article}
    \begin{document}
    ⌀
    \end{document}
    

    (Just compile with xelatex or lualatex and it will produce a PDF file.) But in general (for characters not contained in the default font, or for which you want a different appearance) you need to load the font explicitly:

    \documentclass{article}
    \usepackage{fontspec}
    \newfontfamily\cambria{Cambria}
    \newfontfamily\deva{Chandas}
    
    \begin{document}
    The following is in Cambria: {\cambria ⌀} and the following in Chandas: {\deva आ}.
    \end{document}
    

    loading fonts

    Just remember to set \tracinglostchars=2 and search for "Missing character" warnings in the output.


You get an error telling you that there is no default definition for , which is not present in any of the standard fonts.

The Comprehensive List tells us that \diameter is provided by

  • mathabx
  • MnSymbol
  • wasysym

and it would be very disputable adding one of these packages just in order to make some symbol available.

You get an error message that tells you the precise details; make a choice, for instance wasysym and change your code into

\documentclass{article}
\usepackage[a4paper, total={6.5in, 10.5in}]{geometry}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{cprotect}
\usepackage{fvextra}
\usepackage{textcomp}

\usepackage{wasysym} % for \diameter

% make TOC work correctly
\makeatletter
\def\l@section{\@dottedtocline{1}{1em}{2em}}
\makeatother
\usepackage{index}
\makeindex

%\DeclareUnicodeCharacter{00A0}{~} % not needed in recent versions of LaTeX
\DeclareUnicodeCharacter{2300}{\diameter}

\DefineVerbatimEnvironment{escape}
  {Verbatim}
  {fontfamily=\rmdefault,breaklines,breaksymbolleft={}}
\CustomVerbatimCommand\escapeinline{Verb}{fontfamily=\rmdefault,breaklines,breaksymbolleft={}}

\begin{document}
⌀
\end{document}

You don't need inputenc with LuaTeX but you do need a font that supports the glyph.

Note that other software might make automatic font substitutions when a glyph is not available in the current font. LuaLaTeX and XeLaTeX don't, because they're aimed at good typography and an automatic substitution may give a clash. The same idea as before: the kernel cannot provide a definition for every Unicode character, because this requires choosing from alternate fonts.


It is possible, though, that the user typed in U+2300 in order to get the \varnothing symbol, which is quite likely if the document is about math and not a technological topic. In this case, the package to load is amssymb:

\usepackage{amssymb}

\DeclareUnicodeCharacter{2300}{\varnothing}

Tags:

Unicode

Pdftex