How do web browsers implement font fallback?

On Windows:

Firefox font fallback

Firefox has different algorithm for CJK glyphs and non-CJK glyphs:

non-CJK

non-CJK algorithm is very simple: try all the configured fonts of the given html language. These include both config font.name.{generic}.{language} and the list of config font.name-list.{generic}.{language}.

CJK

CJK is by nature complicated due to the shear number of glyphs, encodings and language variations. Firefox uses a dynamic search algorithm to resolve the glyphs.

  1. Use the configured fonts for the given html language.
  2. Use the configured Japanese (ja) fonts.
  3. Use the configured Korean (ko) fonts.
  4. Use the configured Simplified Chinese (zh-CN) fonts.
  5. Use the configured Traditional Chinese (Hong Kong) (zh-HK) fonts.
  6. Use the configured Traditional Chinese (Taiwan) (zh-TW) fonts.

The algorithm is currently implemented in GetLangPrefs(). In both CJK and non-CJK cases, there is a limit of how many fonts to be searched (32). The script search order is hard coded and thus can't be user configured at the moment.

The advantage of Firefox's fallback algorithm is that, thanks to its dynamic nature, more fonts are searched thus minimizing the chance of user encountering missing glyphs. Additionally, by understanding the search order, users can manipulate the configuration to choose desired fonts for missing glyphs.

The disadvantage is inconsistency: because the search list is hard coded, fonts from certain languages are prioritized for all webpages. For instance, Japanese optimized fonts might be used in tag-missing Korean webpages. Also, since more fonts are tried, the performance might deteriorate.

Chromium font fallback

Unlike Firefox, Chromium chooses a more static approach to search fonts. Instead of dividing CJK cases and going through font list, Chromium hard codes several "core" fonts for each script. Chromium assumes these fonts should always be available, thus only search these fonts. The mapping of script to font can be found in InitializeScriptFontMap(). This mapping cannot be user configured at the moment.

The advantage of this algorithm is simplicity, consistency and performance, at the cost of flexibility and configurability.

The implementation may change in the future. More detail in https://gist.github.com/CrendKing/c162f5a16507d2163d58ee0cf542e695.


Font fallback in browsers (as opposed to, say, in an OS) is based on two things:

  1. The CSS specification, which gives the fonts that are to be used for fallback, and
  2. The text engine, which does text shaping.

The CSS spec is fairly trivial in this respect, simply giving the list of fonts using their system names, but several possible "catch all" fonts that are in no way guaranteed to be the same from computer to computer (there is no reason to assume that serif maps to Times or Times New Roman, for instance).

The fallback algorithm used by text engines is entirely up to the engine, but usually kicks in during the glyph lookup step: the text engine sees a string of code points, and tries to use a font to shape that string. For each point in the sequence, it checks whether the font has a matching glyph (by consulting the CMAP table and subtables), or a rule that tells the engine that there may be a glyph to use only if more code points follow, through the GSUB mechanism (For instance, a font without glyphs for the individual letters e, t and c, but with a glyph for & and a GSUB rule that says the sequence e+t+c should be in-text replaced with the single glyph &), and when it's finished accumulating this kind of "unit of points", it shapes the text and hands it back to whatever asked it to shape text.

If, during glyph lookup, it turns out the font doesn't contain anything that lets the engine shape a particular code point (i.e. running through the CMAP data as well as the GSUB rules still shows "there is no glyph") then the text engine can do two things:

  1. Give up. There is no glyph, instead use the .notdef outline defined as glyph id 0, and generally give you text with lovely empty boxes (lovingly called "tofu" by font folks) or question marks.
  2. Attempt font fallback, where it will try another font to find a glyph for the unsupported code point in.

When using fallback, an engine can go down a list of alternative fonts until either: (a) a glyph is found, or (b) the list is exhausted, at which point the engine has to give up, and will use the .notdef glyph. Whether the engine grabs the .notdef glyph from the original font, or from the last font in the list, is entirely up to the engine (although usually it'll go with the first font, for legibility)

There is no "standard" algorithm for this defined anywhere; font fallback is basically a convenience mechanism offered by text engine authors, like how browsers come with bookmark managers (handy, and not part of any spec). As far as OpenType is concerned, there are no requirements on whether an engine should just serve up .notdef when a glyph is not found, or whether it should serve up the part it could shape, then find the missing glyph somewhere else, and render text that way. CSS implies that your text engine should have at least some form of font fallback, but it doesn't specify how it should work, or when it should kick in.