Fill a lua table with lowercase/uppercase pairs.

Both Miktex and TL includes file UnicodeData.txt, which contains all necessary information. It contains lines in the following form:

0061;LATIN SMALL LETTER A;Ll;0;L;;;;;N;;;0041;;0041

There are several fields delimited with semicolon. Important fields are first, which is current character codepoint, fourth, which is class of character and fifteenth, which contains codepoint of corresponding uppercase character.

We can write simple Lua library which will parse the file and return table with necessary information:

local unicode_data = kpse.find_file("UnicodeData.txt")

local characters = {}
for line in io.lines(unicode_data) do
  local fields = line:explode ";"
  -- we want to process only uppercase letters
  if fields[3] == "Ll" then
    local lowercase = tonumber(fields[1],16)
    -- uppercae codepoint is in field 15
    -- some uppercase letters doesn't have lowercase versions
    local uppercase = tonumber(fields[15],16)
    characters[lowercase] = uppercase
  end
end

return characters

We test for Ll class, which is lowercase letters and construct table with uppercase codepoints. Note that some lowercase chars doesn't have coresponding upeercases, but that's OK, they will not be included in the table.

It can be used in the following way:

\documentclass{article}
\directlua
{
  local lowercases = require "makelowercases"
  lowercases["ß"] = {"S","S"}
 fonts.handlers.otf.addfeature
  {
    name = "vircase",
    type = "multiple",
    data = lowercases
  }
} 

\usepackage{fontspec}
  \setmainfont{OpenSans-Regular.ttf}%
   [
    RawFeature=+vircase,
   ]


\begin{document}
AAAA aaaa ü ß ɒ e o 

Hallo Welt!
\end{document}

It will produce the following result:

enter image description here


I would use the included unicode Lua module and fill the uppercase table by a loop, like this:

\documentclass{article}
\directlua
{
local upper = unicode.utf8.upper
local char = unicode.utf8.char

local data = {}
for c = 0x20, 0x0500 do
    data[char(c)] = {upper(char(c))}
end

data["ß"] = {"S","S"}

fonts.handlers.otf.addfeature {
    name = "vircase",
    {
        type = "multiple",
        data = data,
    }
}
} 

\usepackage{fontspec}
\setmainfont{CMU Serif}%
   [
    RawFeature=+vircase,
   ]

\begin{document}
AAAA aaaa ü ß ɒ e o 

Hallo Welt!

Привет, Мир!
\end{document}

Since I use TexLive 2016, the syntax of fonts.handlers.otf.addfeature arg is a bit different, you can adjust it. I've limited the loop up to 0x0500 which covers Latin scripts, Greek and Cyrillic. Some Cyrillic example is also added (and works!).

enter image description here