Nested brace expansion mystery in Bash

Well, it is unravelled one layer at a time:

X{{a..c},{1..3}}Y

is documented as being expanded to X{a..c}Y X{1..3}Y (that's X{A,B}Y expanded to XA XB with A being {a..c} and B being {1..3}), themselves documented as being expanded to XaY XbY XcY X1Y X2Y X3Y.

What may be worth documenting is that they can be nested (that the first } does not close the first { in there for instance).

I suppose shells could have chosen to resolve the inner braces first, like by acting upon each closing } in turn:

X{{a..c},{1..3}}
X{a,{1..3}}Y X{b,{1..3}}Y X{c,{1..3}}Y

(that is A{a..c}B expanded to AaB AbB AcB, where A is X{ and B is ,{1..3}Y)
X{a,1}Y X{a,2}Y X{a,3}Y X{b,1}Y X{b,2}Y X{b,3}Y X{c,1}Y X{c,2}Y X{c,3}Y
XaY X1Y XaY Xa2...

But I don't find that particularly more intuitive nor useful (see Kevin's example in comments for instance), there would still be some ambiguity as to the order in which the expansions would be done, and that's not how csh (the shell that introduced brace expansion in the late 70s, while the {1..3} form came later (1995) from zsh and {a..c} yet later (2004) from bash) did it.

Note that csh (from the start, see the 2BSD (1979) man page) did document the fact that brace expansions could be nested, though did not explicitly say how nested brace expansions would be expanded. But you can look at the csh code from 1979 to see how it was done then. See how it does explicitly handle nesting indeed, and how it's resolved starting from the outer braces.

In any case, I don't really see how the expansion of {a..c},{1..3} could have any bearing. In there, the , is not an operator of a brace expansion (as it's not inside braces), so is treated like any ordinary character.

Here's the short answer. In the first expression the comma is used as a separator, so the brace expansion is just the concatenation of the two nested subexpressions. In the second expression the comma is itself treated as a single-character subexpression, so product expressions are formed.

What you were missing was the definition of how brace-expansions are performed. Here are three references:

The bash source code
The Bash Hackers Wiki.
The Bash Beginner's Guide

A more detailed explanation follows.

You compared the result of this expression:

$ echo {{a..c},{1..3}}
a b c 1 2 3

to the result of this expression:

$ echo {a..c},{1..3}
a,1 a,2 a,3 b,1 b,2 b,3 c,1 c,2 c,3

You say that this is hard to explain, i.e. that this is counter-intuitive. What's missing is a formal definition of how brace-expansions are processed. You note that the Bash Manual does not give a full definition.

I searched a bit but I couldn't find the missing (complete, formal) definition either. So I went to the source code:

braces.c

The source contains a couple of useful comments. First is a high-level overview of the brace expansion algorithm:

Basic idea:

Segregate the text into 3 sections: preamble (stuff before an open brace),
postamble (stuff after the matching close brace) and amble (stuff after
preamble, and before postamble).  Expand amble, and then tack on the
expansions to preamble.  Expand postamble, and tack on the expansions to
the result so far.

So the format of a brace-expansion token is the following:

<PREAMBLE><AMBLE><POSTAMBLE>

The main entry-point to expansion is a function called brace_expand which is described as follows:

Return an array of strings; the brace expansion of TEXT.

So the brace_expand function takes a string representing a brace expansion expression and returns the array of expanded strings.

Combining these two observations we see that the amble is expanded to a list of strings, each of which is concatenated onto the preamble. The postamble is then expanded into a list of string, and each string in the postamble list is concatenated onto each string in the preamble/amble list (i.e. the product of the two lists is formed). But this doesn't described how the amble and postamble are processed. Luckily there is a comment describing that as well. The amble is processed by a function called expand_amble whose definition is preceded by the following comment:

Expand the text found inside of braces.  We simply try to split the
text at BRACE_ARG_SEPARATORs into separate strings.  We then brace
expand each slot which needs it, until there are no more slots which
need it.

Elsewhere in the code we see that BRACE_ARG_SEPARATOR is defined to be a comma. This makes it clear that the amble is a comma-separated list of strings, some of which may also be brace-expansion expressions as well. These strings then form a single array. Finally, we can also see that after expand_amble is called the brace_expand function is then called recursively on the postamble. This gives us a complete description of the algorithm.

There are some other (unofficial) references that corroborate this finding.

For one reference, check out the Bash Hackers Wiki. The section on combining and nesting doesn't quite address your issue, but the page does give the syntax/grammar of brace expansion, which I think does answer your question. The syntax is given by the following patterns:

{string1,string2,...,stringN}

{<START>..<END>}

<PREAMBLE>{........}

{........}<POSTSCRIPT>

<PREAMBLE>{........}<POSTSCRIPT>

And the parsing is described as follows:

Brace expansion is used to generate arbitrary strings. The specified strings are used to generate all possible combinations with the optional surrounding preambles and postscripts.

For another reference, take a look at the Bash Beginner's Guide, which has the following to say:

Brace expansion is a mechanism by which arbitrary strings may be generated. Patterns to be brace-expanded take the form of an optional PREAMBLE, followed by a series of comma-separated strings between a pair of braces, followed by an optional POSTSCRIPT. The preamble is prefixed to each string contained within the braces, and the postscript is then appended to each resulting string, expanding left to right.

So to parse brace-expansion expressions we go left-to-right, expanding each expression and forming successive products (with respect to the operation of string-concatenation).

Now let's consider your first expression:

{{a..c},{1..3}}

In the language of the Bash Hacker's Wiki, this matches the first form:

{string1,string2,...,stringN}

Where N=2, string1={a..c} and string2={1..3} - the inside brace expansions being performed first and each of them being of the form {<START>..<END>}. Alternatively, we can say that this is a brace-expansion expression which consists only of an amble (no preamble or postamble). The amble is a comma-separated list, so we go through the list one slot at a time, and perform additional expansions where required. No product is formed because there are no adjacent expressions (the comma is used as a separator).

Next let's look at your second expression:

{a..c},{1..3}

In the language of the Bash Hacker's Wiki, this expression matches the form:

{........}<POSTSCRIPT>

where the postscript is the sub-expression ,{1..3}. Alternatively, we can say that this expression has an amble ({a..c}) and a postamble (,{1..3}). The amble is expanded to the list a b c and then each of these is concatenated with each of the strings in the expansion of the postamble. The postamble is processed recursively: it has a preamble of , and an amble of {1..3}. This is expanded to the list ,1 ,2 ,3. The two lists a b c and ,1 ,2 ,3 are then combined to form the product list a,1 a,2 a,3 b,1 b,2 b,3 c,1 c,2 c,3.

It might help to give a psuedo-algebraic description of how these expressions are parsed, where brackets "[]" denote arrays, "+" denotes array concatenation, and "*" denotes the Cartesian product (with respect to concatenation).

Here is how the first expression is expanded (one step per line):

{{a..c},{1..3}}
{a..c} + {1..3}
[a b c] + [1 2 3]
a b c 1 2 3

And here is how the second expression is expanded:

{a..c},{1..3}
{a..c} * ,{1..3}
[a b c] * [,1 ,2 ,3]
a,1 a,2 a,3 b,1 b,2 b,3 c,1 c,2 c,3

My understanding is this:

The inner braces are resolved first (as always) which turns

{{a..c},{1..3}}

into

{a,b,c,1,2,3}

Because the , is within braces it just separates brace elements.

But in the case of

{a..c},{1..3}

the , is not within braces i.e. it is an ordinary character causing brace permutations on both sides.

Nested brace expansion mystery in Bash

Tags:

Bash

Brace Expansion

Related

Recent Posts