Use xindy to sort by bible book instead of ABC – or: How to add custom letter groups?

The easiest solution is to use Klingon: change the line

\makeindex[options = -M index-style -C utf8]

in your TeX file to

\makeindex[options = -M index-style -C utf8 -L klingon]

This produces a PDF file whose index page is typeset as:

Properly typeset index

which I believe is what you wanted.


Long answer:

This one was, for me, a good lesson in software testing: I had to treat the xindy program almost as a black-box, because its documentation is terrible and even actively misleading. In fact, of all the time I spent trying to figure this out, the biggest leap towards a solution was the moment I decided to completely stop trusting the documentation.

The following is not exactly how I arrived at the solution/understanding, but a lightly “fictionalized account” of how one might have.

To recap the question: After saving the LaTeX source from the question as question.tex, changing filecontents to filecontents* (so that it doesn't write %% comments to the file), and running xelatex -shell-escape question.tex (note: shell-escape is in general dangerous; you should not use that option without being absolutely sure the file is safe to run), we get a typeset PDF whose index page looks like the following:

bad index

So the problem is that it seems to have completely ignored the (define-letter-groups ... in the index style.

If you look in the log file (question.log), it mentions the program that was invoked:

runsystem(texindy -M index-style -C utf8 question.idx)...executed.

And you can verify for yourself that it is this command (texindy -M index-style -C utf8 question.idx) that turns question.idx:

\indexentry{Ps 1}{1}
\indexentry{Ps 10}{1}
\indexentry{Ps 2}{1}
\indexentry{Ps 1}{1}
\indexentry{Ps 3}{1}
\indexentry{Ps 56}{1}
\indexentry{Ps 5}{1}
\indexentry{Ps 34,1--2}{1}
\indexentry{Ps 34,7}{1}
\indexentry{Ps 34,1}{1}
\indexentry{Ps 34|full}{1}
\indexentry{Ps 34}{1}
\indexentry{Dtn 3,7.9}{1}
\indexentry{Dtn 3,5}{1}
\indexentry{Dtn 3,8}{1}
\indexentry{Num 122,3}{1}
\indexentry{Num 121}{1}

into question.ind:

\begin{theindex}
  \providecommand*\lettergroupDefault[1]{}
  \providecommand*\lettergroup[1]{%
      \par\textbf{#1}\par
      \nopagebreak
  }

  \lettergroup{D}
  \item Dtn 3,5: 1
  \item Dtn 3,7.9: 1
  \item Dtn 3,8: 1

  \indexspace

  \lettergroup{N}
  \item Num 121: 1
  \item Num 122,3: 1

  \indexspace

  \lettergroup{P}
  \item Ps 1: 1
  \item Ps 2: 1
  \item Ps 3: 1
  \item Ps 5: 1
  \item Ps 10: 1
  \item Ps 34: 1
  \item Ps 34,1: 1
  \item Ps 34,1--2: 1
  \item Ps 34,7: 1
  \item Ps 56: 1

\end{theindex}

The bug already manifests here. So at this point we can forget about the TeX stuff and focus on getting the desired question.ind out of the question.idx using the texindy (or another) command.

The texindy command has some debug options. For example, adding --debug script shows that texindy is essentially calling xindy as xindy -d script -L general -C utf8 -M tex/inputenc/utf8 -M texindy -M page-ranges -M word-order -M index-style.xdy -I latex question.idx. We can run that directly, and see that it produces the same output (the “bad” question.ind).

This list of modules automatically included by texindy also suggests that some of the lines of our index-style.xdy are redundant: we can boil it down to just

(markup-locclass-list :open ": ")
(define-letter-groups ("Num"  "Dtn" "Ps"))

and keep the output the same. We can go further and also reduce the xindy command to xindy -M texindy -M index-style question.idx. Now we can run xindy directly, instead of texindy.

To continue exploring the debug options, the first thing is to have xindy log to a file instead of throwing it away (to /dev/null). This is done with -t question.ilg. This log file has output like

Forming letter-groups:
Letter-group: "?? 0000001" -> "P"
Letter-group: "?? 0000010" -> "P"

and so on, which clearly correspond one-to-one with the lines from the input (question.idx). (?? is what it displays as in my terminal, but I can open it up in a editor to see the actual bytes: e.g. less shows the second line as Letter-group: "<C8><D0> 0000010" -> "P" with those two characters higlighted.)

So, for some reason, it has converted input entry like \indexentry{Ps 10}{1} into "<C8><D0> 0000010", which then gets mapped to letter group P.

The --debug level=1 (or higher) option to xindy has something relevant in the log: lines like

Add sort rule to run 0: #<ordrule: '<E4>' => '<80>' :again NIL>
Add sort rule to run 0: #<ordrule: 'A' => '<80>' :again NIL>
...
Add sort rule to run 0: #<ordrule: 'p' => '<C8>' :again NIL>
Add sort rule to run 0: #<ordrule: 'P' => '<C8>' :again NIL>

and so on. But this still doesn't explain where those sort rules are coming from. With the remaining debug options --debug script --debug keep_tmpfiles though, we can find out: First, the command uses a filter called tex2xindy to convert our .idx file into a file that has lines like:

(indexentry :tkey (("Ps 1")) :locref "1")
(indexentry :tkey (("Ps 10")) :locref "1")

(entirely in Lisp syntax). This file, we will soon learn, is called the "rawindex". Then, the “core” of xindy runs with a command like:

xindy.run -M /Library/TeX/texbin/xindy.mem -E iso-8859-1 -x (progn
  (searchpath ".:/usr/local/texlive/2015/texmf-dist/xindy/modules:/usr/local/texlive/2015/texmf-dist/xindy/modules/base")
  (xindy:startup
    :idxstyle "<tmp file 1>"
    :rawindex "<above tmp file>"
    :output "./question.ind"
    :logfile "question.ilg"

    )
  (exit))

We can run this command directly, and see that it (still) produces the same .ind file. We're getting close: looking at the tmp file above (the attribute :idxstyle), it looks like:

(require "lang/general/latin9-lang.xdy")
(require "texindy.xdy")
(require "index-style.xdy")

The last two (of the three) lines were specified manually by our commandline xindy -M texindy -M index-style; the first one was inserted automatically by xindy. (The latin9 does not matter here; if we had used -C utf8 we'd get lang/general/utf8-lang.xdy here in which the problems are the same.) Looking inside that file (texmf-dist/xindy/modules/lang/general/latin9-lang.xdy), it starts with

(require "lang/general/latin9.xdy")

and that file starts with lines like:

(define-letter-group "A" :prefixes ("<80>"))
(define-letter-group "B" :after "A" :prefixes ("<84>"))
(define-letter-group "C" :after "B" :prefixes ("<86>"))
(define-letter-group "D" :after "C" :prefixes ("<8D>"))

which give a hint as to what's going on: already by the time define-letter-group is involved, the characters have already been transformed into bytes like "<80>" (recall the log lines like Add sort rule to run 0: #<ordrule: 'A' => '<80>' :again NIL> that we saw earlier). So these define-letter-group commands are operating not on the raw input in the index file, but on these transformed bytes.

At this point, we can just use the same byte replacements as in this file, to get Solution 1: Create the file index-style.xdy whose contents are lightly modified from your original, using the following Python script:

s = '''
(markup-locclass-list :open ": ")

(define-letter-group "Num" :prefixes ("\xbc\xe0\xbb"))
(define-letter-group "Dtn" :prefixes ("\x8d\xda\xbc") :after "Num")
(define-letter-group "Ps" :prefixes ("\xc8\xd0") :after "Dtn")
'''

open('index-style.xdy', 'w').write(s)

(This Python script just assigns the file contents as a string to the variable s and writes out that variable to the file index-style.xdy: you can use any language or even a shell script; the goal is just to get binary data into the file.)

Then using this index-style.xdy gives the index you desired (and which is in the screenshot above).

The explanation, for why these transformations happen, is lower in the same file (now I'm using utf8.xdy instead of latin9.xdy):

(define-rule-set "xy-alphabetize"

  :rules  (("À" "<80>" :string)
           ("Ă" "<80>" :string)
           ("â" "<80>" :string)
...
           ("A" "<80>" :string)
...
           ))

and this rule-set is used in utf8-lang.xdy:

(use-rule-set :run 0
              :rule-set ("xy-alphabetize" "xy-ignore-special"))

because of which in the very first (0th) run, all the characters are transformed already, which is why our (define-letter-groups ("Num" "Dtn" "Ps")) did not work. The documentation is correct about (define-letter-groups ...), in the sense that the feature really exists and the core of xindy supports it. However, the default “general” configuration (which by the way, is documented in the file comments as “A general sorting order for Western European languages”), explicitly does some stuff that gets in the way, and the documentation says nothing about dealing with this.

If they had provided instead of “general” a “bare” “language” that is even more minimal than “general” (and does not do the hackery with translating characters to bytes outside the ascii range), then using (define-letter-groups with english-alphabet characters would have just worked (as simple as giving -L bare or whatever).

To confirm that this is the case, we can pursue an alternative approach (Solution 2), of giving an explicit file (without the first line), instead of the temporary file for idxstyle. In this file we remove the first (require "lang/general/latin9-lang.xdy") line and keep only the requires of texindy and our index-style.xdy. If we do that, e.g. I put the two lines into a file called forced-idxstyle and changed the xindy.run command to just

/Library/TeX/texbin/xindy.run -M /Library/TeX/texbin/xindy.mem -E iso-8859-1 -x '(progn (searchpath ".:/usr/local/texlive/2015/texmf-dist/xindy/modules:/usr/local/texlive/2015/texmf-dist/xindy/modules/base") (xindy:startup :idxstyle "./forced-idxstyle" :rawindex "<same tmp file as before>" :output "./question.ind" :logfile "question.ilg") (exit))'

it worked: our plain index-style.xdy file which had just the two lines

(markup-locclass-list :open ": ")
(define-letter-groups ("Num"  "Dtn" "Ps"))

now starts giving the exact question.ind file we want (and we can run xelatex and we'll get the desired typeset output in the index).

This confirms that define-letter-groups works as documented, and would have been usable by the end user (us) if not for all the “helpful” transformations done by all of xindy's default languages, for treating accented letters the same as unaccented letters.

Actually not all: among the languages that come with xindy there are a few that don't work with Latin-script characters, and in particular don't transform "A" (for example) into anything else:

belarusian bulgarian georgian greek hebrew klingon korean macedonian russian serbian ukrainian

Of these, if you really want a language that will never get in the way when using English, then “korean” is the most promising, because the others (even Klingon which isn't even in Unicode and uses private-area characters!) do transform some “special” characters like hyphens and semicolons. So this gives Solution 3 that I put at the top of this answer: keep your original index style, keep your original TeX file, just append -L korean to the indexing command, and you get the index you desired. (The other languages like Klingon etc. also work.)


I solved this problem for my work without using xindy. My alternative was to use sort keys. Thus: \index{19@Psalms!0901 @9:1}. 19 is Psalms' order in the Hebrew Bible, so this number before the @ sign sorts the books in correct order. The exclamation point (!) makes the book itself, Psalms, the item and verses will be indexed as subitems below it. You can just remove this if you want the book title before every citation. 0901 is the sort code for 9:1 with zeroes preventing chapters from 10 (example: 11:1) being listed first. The space prevents a verse range from appearing before single verses, so 0901_ will sort ahead of 0901-07. (This is a genius insight of Barbara Beeton's from another post).

If you have \index{05@Deuteronomy!1212 @12:12} \index{02@Exodus!0101-12@1:1-12} \index{19@Psalms!0901 @9:1} this should now result in correct biblical order. So:

Exodus

1:1-12

Deuteronomy

12:12

Psalms

9:1

I preferred this solution myself. You may also want to check out the bibleref pacakage too. I haven't used it, but I believe it can work with indexes.

Tags:

Indexing

Xindy