How to use call-by-reference on an argument in a bash function

Description

Understanding this will take some effort. Be patient. The solution will work correctly in bash. Some "bashims" are needed.

First: We need to use the "Indirect" access to a variable ${!variable}. If $variable contains the string animal_name, the "Parameter Expansion": ${!variable} will expand to the contents of $animal_name.

Lets see that idea in action, I have retained the names and values you used where possible to make it easier for you to understand:

#!/bin/bash

function delim_to_array() {
    local VarName=$1

    local IFS="$2";
    printf "inside  IFS=<%s>\n" "$IFS"

    echo "inside  var    $VarName"
    echo "inside  list = ${!VarName}"

    echo a\=\(${!VarName}\)
    eval a\=\(${!VarName}\)
    printf "in  <%s> " "${a[@]}"; echo

    eval $VarName\=\(${!VarName}\)
}

animal_list="anaconda, bison, cougar, dingo"
delim_to_array "animal_list" ","

printf "out <%s> " "${animal_list[@]}"; echo
printf "outside IFS=<%s>\n" "$IFS"

# Now we can use animal_name as an array
for animal in "${animal_list[@]}"; do
    echo "NAME: $animal"
done

If that complete script is executed (Let's assume its named so-setvar.sh), you should see:

$ ./so-setvar.sh
inside  IFS=<,>
inside  var    animal_list
inside  list = anaconda, bison, cougar, dingo
a=(anaconda  bison  cougar  dingo)
in  <anaconda> in  <bison> in  <cougar> in  <dingo> 
out <anaconda> out <bison> out <cougar> out <dingo> 
outside IFS=< 
>
NAME: anaconda
NAME: bison
NAME: cougar
NAME: dingo

Understand that "inside" means "inside the function", and "outside" the opposite.

The value inside $VarName is the name of the var: animal_list, as a string.

The value of ${!VarName} is show to be the list: anaconda, bison, cougar, dingo

Now, to show how the solution is constructed, there is a line with echo:

echo a\=\(${!VarName}\)

which shows what the following line with eval executes:

a=(anaconda  bison  cougar  dingo)

Once that is evaluated, the variable a is an array with the animal list. In this instance, the var a is used to show exactly how the eval affects it.

And then, the values of each element of a are printed as <in> val.
And the same is executed in the outside part of the function as <out> val
That is shown in this two lines:

in  <anaconda> in  <bison> in  <cougar> in  <dingo>
out <anaconda> out <bison> out <cougar> out <dingo>

Note that the real change was executed in the last eval of the function.
That's it, done. The var now has an array of values.

In fact, the core of the function is one line: eval $VarName\=\(${!VarName}\)

Also, the value of IFS is set as local to the function which makes it return to the value it had before executing the function without any additional work. Thanks to Peter Cordes for the comment on the original idea.

That ends the explanation, hope its clear.


Real Function

If we remove all the unneeded lines to leave only the core eval, only create a new variable for IFS, we reduce the function to its minimal expression:

delim_to_array() {
    local IFS="${2:-$' :|'}"
    eval $1\=\(${!1}\);
}

Setting the value of IFS as a local variable, allows us to also set a "default" value for the function. Whenever the value needed for IFS is not sent to the function as the second argument, the local IFS takes the "default" value. I felt that the default should be space ( ) (which is always an useful splitting value), the colon (:), and the vertical line (|). Any of those three will split the values. Of course, the default could be set to any other values that fit your needs.

Edit to use read:

To reduce the risk of unquoted values in eval, we can use:

delim_to_array() {
    local IFS="${2:-$' :|'}"
    # eval $1\=\(${!1}\);
    read -ra "$1" <<<"${!1}"
}

test="fail-test"; a="fail-test"

animal_list='bison, a space, {1..3},~/,${a},$a,$((2+2)),$(echo "fail"),./*,*,*'

delim_to_array "animal_list" ","
printf "<%s>" "${animal_list[@]}"; echo

$ so-setvar.sh
<bison>< a space>< {1..3}><~/><${a}><$a><$((2+2))><$(echo "fail")><./*><*><*>

Most of the values set above for the var animal_list do fail with eval.
But pass the read without problems.

  • Note: It is perfectly safe to try the eval option in this code as the values of the vars have been set to plain text values just before calling the function. Even if really executed, they are just text. Not even a problem with ill-named files, as pathname expansion is the last expansion, there will be no variable expansion re-executed over the pathname expansion. Again, with the code as is, this is in no way a validation for general use of eval.

Example

To really understand what, and how this function works, I re-wrote the code you posted using this function:

#!/bin/bash

delim_to_array() {
        local IFS="${2:-$' :|'}"
        # printf "inside  IFS=<%s>\n" "$IFS"
        # eval $1\=\(${!1}\);
        read -ra "$1" <<<"${!1}";
}

animal_list="anaconda, bison, cougar, dingo"
delim_to_array "animal_list" ","
printf "NAME: %s\t " "${animal_list[@]}"; echo

people_list="alvin|baron|caleb|doug"
delim_to_array "people_list"
printf "NAME: %s\t " "${people_list[@]}"; echo

$ ./so-setvar.sh
NAME: anaconda   NAME:  bison    NAME:  cougar   NAME:  dingo    
NAME: alvin      NAME: baron     NAME: caleb     NAME: doug      

As you can see, the IFS is set only inside the function, it is not changed permanently, and therefore it does not need to be re-set to its old value. Additionally, the second call "people_list" to the function takes advantage of the default value of IFS, there is no need to set a second argument.


« Here be Dragons » ¯\_(ツ)_/¯


Warnings 01:

As the (eval) function was constructed, there is one place in which the var is exposed unquoted to the shell parsing. That allows us to get the "word splitting" done using the IFS value. But that also expose the values of the vars (unless some quoting prevent that) to: "brace expansion", "tilde expansion", "parameter, variable and arithmetic expansion", "command substitution", and "pathname expansion", In that order. And process substitution <() >() in systems that support it.

An example of each (except last) is contained in this simple echo (be careful):

 a=failed; echo {1..3} ~/ ${a} $a $((2+2)) $(ls) ./*

That is, any string that starts with {~$`<> or could match a file name, or contains ?*[] is a potential problem.

If you are sure that the variables do not contain such problematic values, then you are safe. If there is the potential to have such values, the ways to answer your question are more complex and need more (even longer) descriptions and explanations. Using read is an alternative.

Warnings 02:

Yes, read comes with it's own share of «dragons».

  • Always use the -r option, it is very hard for me to think of a condition where it is not needed.
  • The read command could get only one line. Multi-lines, even by setting the -d option, need special care. Or the whole input will be assigned to one variable.
  • If IFS value contains an space, leading and trailing spaces will be removed. Well, the complete description should include some detail about the tab, but I'll skip it.
  • Do not pipe | data to read. If you do, read will be in a sub-shell. All variables set in a sub-shell do not persist upon returning to the parent shell. Well, there are some workarounds, but, again, I'll skip the detail.

I didn't mean to include the warnings and problems of read, but by popular request, I had to include them, sorry.


The Bash FAQ has a whole entry about calling by reference / indirection.

In the simple case, a better alternative to the eval suggested by other answers, that makes the quoting much easier.

func() {  # set the caller's simple non-array variable
    local retvar=$1
    printf -v "$retvar"  '%s ' "${@:2}"  # concat all the remaining args
}

Bash-completion (the code that runs when you hit tab) has switched over to printf -v instead of eval for its internal functions, because it's more readable and probably faster.

For returning arrays, the Bash FAQ suggests using read -a to read into sequential array indices of an array variable:

# Bash
aref=realarray
IFS=' ' read -d '' -ra "$aref" <<<'words go into array elements'

Bash 4.3 introduced a feature that makes call-by-reference massively more convenient. Bash 4.3 is still new-ish (2014).

func () { # return an array in a var named by the caller
    typeset -n ref1=$1   # ref1 is a nameref variable.
    shift   # remove the var name from the positional parameters
    echo "${!ref1} = $ref1"  # prints the name and contents of the real variable
    ref1=( "foo" "bar" "$@" )  # sets the caller's variable.
}

Note that the wording of the bash man page is slightly confusing. It says the -n attribute can't be applied to array variables. This means you can't have an array of references, but you can have a reference to an array.


You cannot change the variable (or array in this case) inside the function because you pass only its content - function doesn't know which variable has been passed.

As a workaround you can pass the name of the variable and inside the functionevaluate it to get the content.

#!/bin/bash 

function delim_to_array() {
  local list=$1
  local delim=$2
  local oifs=$IFS;

  IFS="$delim"
  temp_array=($(eval echo '"${'"$list"'}"'))
  IFS=$oifs;

  eval "$list=("${temp_array[@]}")"            
}                                             

animal_list="anaconda, bison, cougar, dingo"
delim_to_array "animal_list" ","
printf "NAME: %s\n" "${animal_list[@]}"

people_list="alvin|baron|caleb|doug"
delim_to_array "people_list" "|"
printf "NAME: %s\n" "${people_list[@]}"

Pay close attention to the quotes in the lines where eval is used. Part of the expression needs to be in single quotes, other part in double quotes. Additionally I've replaced the for loop to the simpler printf command in the final printing.

Output:

NAME: anaconda
NAME: bison
NAME: cougar
NAME: dingo
NAME: alvin
NAME: baron
NAME: caleb
NAME: doug