Effiency and speed of expl3 "datatypes"

There are three things to bear in mind here. First, at a fundamental level expl3 cannot extend the raw data types available in TeX itself: macros, registers and a few other special cases (case-changing codes, font dimensions, etc.). Second, the implementation detail of expl3 can and has changed: the documented interfaces are what is stable. So any answer here could change. Finally, 'efficiency' can mean differing things. One can worry about efficiency in csname usage, speed of random access, speed of mapping, etc. and these things may depend on the likely scale.

All that said, we can examine the data types as they are now. The most basic is the tl, which is a wrapper around storage in macros. This can hold 'anything' but has no pre-defined structure. Essentially these are as fast as one can store tokens. (There is a wrinkle here: tl data is stored with in primitive terms \edef{\unexpanded{<content>}}, which allows # tokens but is very slightly slower than \toks<number>={<content}.)

Other data types holding tokens are (currently) implemented as single tl holders with internal structure. This is most obvious for clist (which is largely a shortcut for user input manipulation) but is also true for seq and prop. (Note: the latter has had at least three different implementations over the years.) The prop case illustrates well the question of 'What does efficiency mean?': using one csname per prop means you can have lots of prop data, but makes them slower for large numbers of keyvals than alternatives. This optimisation in terms of name usage was the original reason for having the data type (in the pre-e-TeX days), but also means they are fast to copy, etc.

For specialist applications there are often clever tricks that can speed up data access. For example, Bruno has recently added a fast integer array structure for supporting l3regex, which uses fontdimen data at the core. These though can be tricky to fully explain to others.

The team have worked on more flexible data structures including working out balances of different forms of efficiency. Probably if you want implementation advice you are best to ask on LaTeX-L or of course consult source3.


Generally you'd expect seq to be more efficient than clist as it's set up internally for processing sequences, conversely clist is a comma separated list so it's set up for convenient user input with comma separators that then need to be parsed/removed to iterate over the list. str/tl aren't really comparable as they are not implementing any kind of list data structure.

In some sense of course everything is a tl really as token lists are TeX's only real structure.

But if you just need a dataype of pairs to build a binary tree, whether to use two item tl or two item seq is a matter of choice, tl gives you essentially nothing and you would need to build your binary datatype up. seq would give you more functions for iterating over the sequence and selecting items, but may be overkill if all your sequences are of length two.

You might also want to consider modelling a tree using prop with left and right properties which may be more readable.

Tags:

Expl3