Syntax and semantics of XDV commands (XeTeX)

Not a lot better than the source you gave but the xetex output is read by xdvipdfmx and the source for that in texlive svn has dvicodes.h which has

                    /* XeTeX ".xdv" codes */
#define XDV_NATIVE_FONT_DEF 252 /* fontdef for native platform font */
#define XDV_GLYPHS          253 /* string of glyph IDs with X and Y positions */
#define XDV_TEXT_AND_GLYPHS 254 /* like XDV_GLYPHS plus original Unicode text */

#define PTEXDIR             255 /* Ascii pTeX DIR command */

These are handled by dvi.c in the same directory, but I guess your C is better than mine:-)

There are some comments as to the expected byte layout in the C eg

case XDV_GLYPHS:
  need_XeTeX(opcode);
  get_and_buffer_bytes(fp, 4);            /* width */
  len = get_and_buffer_unsigned_pair(fp); /* glyph count */
  get_and_buffer_bytes(fp, len * 10);     /* 2 bytes ID + 8 bytes x,y-location per glyph */
  break;
case XDV_TEXT_AND_GLYPHS:
  need_XeTeX(opcode);
  len = get_and_buffer_unsigned_pair(fp); /* utf16 code unit count */
  get_and_buffer_bytes(fp, len * 2);      /* 2 bytes per code unit */
  get_and_buffer_bytes(fp, 4);            /* width */
  len = get_and_buffer_unsigned_pair(fp); /* glyph count */
  get_and_buffer_bytes(fp, len * 10);     /* 2 bytes ID + 8 bytes x,y-location per glyph */
  break;
case XDV_NATIVE_FONT_DEF:
  need_XeTeX(opcode);
  do_native_font_def(get_signed_quad(dvi_file));
  break;

dviasm (despite its name) can show xdv files, I had to change the font loading to find the font, but on your test file it reports

[preamble]
id: 7
numerator: 25400000
denominator: 473628672
magnification: 1000
comment: ' XeTeX output 2019.06.16:2127'

[postamble]
maxv: 633.947250pt
maxh: 407pt
maxs: 3
pages: 1

[font definitions]
fntdef: "/usr/local/texlive/2018/texmf-dist/fonts/truetype/public/amiri/amiri-regular.ttf" at 10pt

[page 1 0 0 0 0 0 0 0 0 0]
xxx: 'pdf:pagesize default'
down: 633pt
push:
  down: -605pt
  down: 575pt
  push:
    down: -540pt
    push:
      right: 300.325195pt
      xxx: 'pdf:docinfo<</BIDI.Fullbanner(This is the bidi package, Version 35.8, Released May 1, 2019. )>>'
      w: 2.929688pt
      fnt: "/usr/local/texlive/2018/texmf-dist/fonts/truetype/public/amiri/amiri-regular.ttf" at 10pt
      setglyphs: 31.865234pt gid37(0pt) gid82(5.820312pt) gid81(10.595703pt) gid77(15.791016pt) gid82(18.125000pt) gid88(23.095703pt) gid85(28.125000pt)
      w0:
      setglyphs: 19.155273pt gid2388(0pt) gid4497(10.791016pt) gid4466(13.232422pt) gid2021(14.956055pt) gid2083(17.250977pt)
      w0:
      xxx: 'ligne x'
      setglyphs: 31.865234pt gid37(0pt) gid82(5.820312pt) gid81(10.595703pt) gid77(15.791016pt) gid82(18.125000pt) gid88(23.095703pt) gid85(28.125000pt)
    pop:
  pop:
  down: 30pt
  push:
    right: 231.570312pt
    setglyphs: 5.859375pt gid447(0pt)
  pop:
pop:

With the tex file you gave apart from changing the font line to

\setmainfont[Script=Arabic]{Amiri}

Based on the dvisvgm sources and dvipdfm-x sources: The truly XDV-specific opcodes are (as of the current Version 7) only three:

  • 252 (fc): This is to define a font (code refers to it as XDV_NATIVE_FONT_DEF or XFontDef), and is the most complicated of the three. Parameters are:

    fontnum[4] ptsize[4] flags[2] psname_len[1] fontname[psname_len] fontIndex[4]

    followed by up to (2 + 4 * 65535) more bytes depending on the flags.

  • 253 (fd). This is a “string of glyph IDs with X and Y positions”, referred to in code by XDV_GLYPHS or XGlyphArray. Parameters are:

    w[4] n[2] xy[(4+4)n] g[2n]

    where w is the total width of the glyph array, n is the number of glyphs, xy is a sequence of (dx, dy) pairs (the relative horizontal and vertical positions of each glyph), and g contains the “FreeType indices of the glyphs to typeset”.

  • 254 (fe): This is similar except it includes “a leading array of UTF-16 characters that specify the "actual text" represented by the glyphs to be printed. It usually contains the text with special characters (like ligatures) expanded so that it can be used for text search, plain text copy & paste etc. This XDV command was introduced with XeTeX 0.99995 and can be triggered by \XeTeXgenerateactualtext1”. So its parameters are:

    parameters: l[2] t[2l] w[4] n[2] xy[8n] g[2n]

I don't think the TeX-XeT commands 250–251 nor the pTeX command 255 are used by XeTeX, which is consistent with you not seeing them in the file.

The hexdump in the question starts with f7 = 247, the DVI “pre” command, and the next byte is the DVI version, which here is 07. So we're looking at (XDV) version 7, as expected.

So in your file when you see (at either byte offset 278 or 423) bytes like fd 00 1f dd 80 00 07 00 00 00 and so on, it's actually not just two bytes that are the parameters, but rather 00 1f dd 80 are w, then 00 07 are n (the number of glyphs), then the next 56 bytes are xy or (dx, dy) (the offsets for each of these 7 glyphs), then the next 14 bytes are g or glyphs (the 7 glyphs). Needless to say, these 7 in your example are Bonjour:

    00 25 00 52 00 51 00 4d 00 52 00 58 00 55

As you observe, these are not ASCII codes, so where does this mapping of 00 25 to B, etc come from? Well it's the same as with the regular DVI format: these are the positions of the glyphs in the font, and the font can choose to put any glyph at any position. This is confirmed by opening the font and counting positions: maybe FontForge can show it but I couldn't find it in the UI, but I could find it with fonttools:

$ ttx amiri-regular.ttf 
Dumping "amiri-regular.ttf" to "amiri-regular.ttx"...

and the file contains:

 <GlyphID id="37" name="B"/>

where 37 is 0x25, etc.


The definitive source is the XeTeX source tree, and specifically the xetex.web file. Quoting from it:

\yskip\noindent Commands 250--255 are undefined in normal .{DVI} files, but the following commands are used in .{XDV} files.

\yskip\hang\vbox{\halign{#&#\hfil\cr |define_native_font| 252 & |k[4]| |s[4]| |flags[2]| |l1| |n[l]| |i[4]|\cr & |if (flags and COLORED) then| |rgba[4]|\cr & |if (flags and EXTEND) then| |extend[4]|\cr & |if (flags and SLANT) then| |slant[4]|\cr & |if (flags and EMBOLDEN) then| |embolden[4]|\cr }}

\yskip\hang|set_glyphs| 253 |w[4]| |k[2]| |xy[8k]| |g[2k]|.

\yskip\hang|set_text_and_glyphs| 254 |l[2]| |t[2l]| |w[4]| |k[2]| |xy[8k]| |g[2k]|.

\yskip\noindent Commands 250 and 255 are undefined in normal .{XDV} files.

typeset version

Tags:

Dvi

Xetex