What is the data structure of $@ in shell?

That started as a hack in the Bourne shell. In the Bourne shell, IFS word splitting was done (after tokenisation) on all words in list context (command line arguments or the words the for loops loop on). If you had:

IFS=i var=file2.txt
edit file.txt $var

That second line would be tokenised in 3 words, $var would be expanded, and split+glob would be done on all three words, so you would end up running ed with t, f, le.txt, f, le2.txt as arguments.

Quoting parts of that would prevent the split+glob. The Bourne shell initially remembered which characters were quoted by setting the 8th bit on them internally (that changed later when Unix became 8bit clean, but the shell still did something similar to remember which byte was quoted).

Both $* and $@ were the concatenation of the positional parameters with space in-between. But there was a special processing of $@ when inside double-quotes. If $1 contained foo bar and $2 contained baz, "$@" would expand to:

foo bar baz
^^^^^^^ ^^^

(with the ^s above indicating which of the characters have the 8th bit set). Where the first space was quoted (had the 8th bit set) but not the second one (the one added in-between words).

And it's the IFS splitting that takes care of separating the arguments (assuming the space character is in $IFS as it is by default). That's similar to how $* was expanded in its predecessor the Mashey shell (itself based on the Thomson shell, while the Bourne shell was written from scratch).

That explains why in the Bourne shell initially "$@" would expand to the empty string instead of nothing at all when the list of positional parameters was empty (you had to work around it with ${1+"$@"}), why it didn't keep the empty positional parameters and why "$@" didn't work when $IFS didn't contain the space character.

The intention was to be able to pass the list of arguments verbatim to another command, but that didn't work properly for the empty list, for empty elements or when $IFS didn't contain space (the first two issues were eventually fixed in later versions).

The Korn shell (on which the POSIX spec is based) changed that behaviour in a few ways:

  • IFS splitting is only done on the result of unquoted expansions (not on literal words like edit or file.txt in the example above)
  • $* and $@ are joined with the first character of $IFS or space when $IFS is empty except that for a quoted "$@", that joiner is unquoted like in the Bourne shell, and for a quoted "$*" when IFS is empty, the positional parameters are appended without separator.
  • it added support for arrays, and with ${array[@]} ${array[*]} reminiscent of Bourne's $* and $@ but starting at indice 0 instead of 1, and sparse (more like associative arrays) which means $@ cannot really be treated as a ksh array (compare with csh/rc/zsh/fish/yash where $argv/$* are normal arrays).
  • The empty elements are preserved.
  • "$@" when $# is 0 now expands to nothing instead of the empty string, "$@" works when $IFS doesn't contain spaces except when IFS is empty. An unquoted $* without wildcards expands to one argument (where the positional parameters are joined with space) when $IFS is empty.

ksh93 fixed the remaining few problems above. In ksh93, $* and $@ expands to the list of positional parameters, separated regardless of the value of $IFS, and then further split+globbed+brace-expanded in list contexts, $* joined with first byte (not character) of $IFS, "$@" in list contexts expands to the list of positional parameters, regardless of the value of $IFS. In non-list context, like in var=$@, $@ is joined with space regardless of the value of $IFS.

bash's arrays are designed after the ksh ones. The differences are:

  • no brace-expand upon unquoted expansion
  • first character of $IFS instead of for byte
  • some corner case differences like the expansion of $* when non-quoted in non-list context when $IFS is empty.

While the POSIX spec used to be pretty vague, it now more or less specifies the bash behaviour.

It's different from normal arrays in ksh or bash in that:

  • Indices start at 1 instead of 0 (except in "${@:0}" which includes $0 (not a positional parameter, and in functions gives you the name of the function or not depending on the shell and how the function was defined)).
  • You can't assign elements individually
  • it's not sparse, you can't unset elements individually
  • shift can be used.

In zsh or yash where arrays are normal arrays (not sparse, indices start at one like in all other shells but ksh/bash), $* is treated as a normal array. zsh has $argv as an alias for it (for compatibility with csh). $* is the same as $argv or ${argv[*]} (arguments joined with the first character of $IFS but still separated out in list contexts). "$@" like "${argv[@]}" or "${*[@]}"} undergoes the Korn-style special processing.


However, I don't know what data structure $@ is.

It's a special parameter that expands to the values of the positional parameters... But that's nitpicking about the terminology.

We can view the positional parameters as parts of $@, so it has a number of distinct elements ($1, $2...), that can be accessed independently and are named by consecutive natural numbers. That makes it something that is usually called an array.

The syntax is a bit weird, though, and even limited. There's no way to modify a single element of the array individually. Instead, the whole thing has to be set at once. (You can use set -- "$@" foo to append a value, or set -- "${@:1:2}" foo "${@:3}" to add a value in the middle. But you in both cases you have to write out the whole resulting list.)

Why it behave differently with $* when including in double quote,

Because they're defined to behave differently.

However, it can also echoed entirely with simple echo $@, if it is an array, only first element will be shown.

If you mean the fact that a=(foo bar asdf); echo $a will output just foo, then this is mostly a quirk of the shell syntax, and the fact that ksh-style named arrays were created later than the positional parameters and $@. Plain $a is the same as ${a[0]} so it has the backward-compatible meaning of a single scalar value, regardless of if a is an array or a simple scalar variable.

The @ sign referring to the whole list was reused with named arrays in that "${a[@]}" is the way to get the whole list. Compared to named arrays, with $@, the unnecessary braces and brackets and the name are just skipped.

Or in other words, I want to know how $@ stored in computer memory.

That depends on the implementation, you'll have to look the source code of any particular shell you care about.

Is it a string, a multi-line string or a array?

An array, mostly. Though different from the ksh-style named arrays, since they can have arbitrary nonnegative integers as indexes, not just consecutive ones as with $@. (That is, a named array can be sparse, and have e.g. the indexes 1, 3 and 4, with 0 and 2 missing. That's not possible with the positional parameters.)

It's not a single string, since it can be expanded to distinct elements, and calling the elements lines is also not right, since any regular variable, or one of the positional parameters (elements of $@) can also contain newlines.

If it is a unique data type, is it possible to define a custom variable as an instance of this type?

No. But named arrays are probably more useful anyway.

Tags:

Shell

Bash