Inverting an associative array

zsh

to reverse keys <=> values

In zsh, where the primary syntax for defining a hash is hash=(k1 v1 k2 v2...) like in perl (newer versions also support the awkward ksh93/bash syntax for compatibility though with variations when it comes to quoting the keys)

keys=("${(@k)hash}")
values=("${(@v)hash}")

typeset -A reversed
reversed=("${(@)values:^keys}") # array zipping operator

or using a loop:

for k v ("${(@kv}hash}") reversed[$v]=$k

The @ and double quotes is to preserve empty keys and values (note that bash associative arrays don't support empty keys). As the expansion of elements in associative arrays is in no particular order, if several elements of $hash have the same value (which will end up being a key in $reversed), you can't tell which key will be used as the value in $reversed.

for your loop

You'd use the R hash subscript flag to get elements based on value instead of key, combined with e for exact (as opposed to wildcard) match, and then get the keys for those elements with the k parameter expansion flag:

for value ("${(@u)hash}")
  print -r "elements with '$value' as value: ${(@k)hash[(Re)$value]}"

your perl approach

zsh (contrary to ksh93) doesn't support arrays of arrays, but its variables can contain the NUL byte, so you could use that to separate elements if the elements don't otherwise contain NUL bytes, or use the ${(q)var} / ${(Q)${(z)var}} to encode/decode a list using quoting.

typeset -A seen
for k v ("${(@kv)hash}")
  seen[$v]+=" ${(q)k}"

for k v ("${(@kv)seen}")
  print -r "elements with '$k' as value: ${(Q@)${(z)v}}"

ksh93

ksh93 was the first shell to introduce associative arrays in 1993. The syntax for assigning values as a whole means it's very difficult to do it programmatically contrary to zsh, but at least it's somewhat justified in ksh93 in that ksh93 supports complex nested data structures.

In particular, here ksh93 supports arrays as values for hash elements, so you can do:

typeset -A seen
for k in "${!hash[@]}"; do
  seen[${hash[$k]}]+=("$k")
done

for k in "${!seen[@]}"; do
  print -r "elements with '$k' as value ${x[$k][@]}"
done

bash

bash added support for associative arrays decades later, copied the ksh93 syntax, but not the other advanced data structures, and doesn't have any of the advanced parameter expansion operators of zsh.

In bash, you could use the quoted list approach mentioned in the zsh using printf %q or with newer versions ${var@Q}.

typeset -A seen
for k in "${!hash[@]}"; do
  printf -v quoted_k %q "$k"
  seen[${hash[$k]}]+=" $quoted_k"
done

for k in "${!seen[@]}"; do
  eval "elements=(${seen[$k]})"
  echo -E "elements with '$k' as value: ${elements[@]}"
done

As noted earlier however, bash associative arrays don't support the empty value as a key, so it won't work if some of $hash's values are empty. You could choose to replace the empty string with some place holder like <EMPTY> or prefix the key with some character that you'd later strip for display.


The stumbling block, as I'm sure you know, is to get the whole value of an indexed array when having its name as value of a (another) variable. I couldn't do it with less than having an intermediate whose value becomes of format ${v[@]} and then use eval on that. So, here's that approach:

declare -A keys
N=0 # counter for the index variables IX1, IX2, IX3, ...
for key in "${!hash[@]}"; do
    value="${hash[$key]}"
    if [ -z "${keys[$value]}" ] ; then N=$((N+1)) ; keys[$value]=IX$N ; fi
    index="${keys[$value]}" # 'index' is now name of index variable
    X="\${$index[@]}"
    eval "$index=( $X $key )" # adding next key to it
done

for value in "${!keys[@]}" ; do
    index=${keys[$value]}
    X="\${$index[@]}"
    printf "Value %s is present with the following keys: %s\n" \
       "$value" "$(eval echo "$X")"
done

This is for Linux bash. It creates indexed arrays IX1, IX2, etc., for the various values it encounters, and holds those names in the keys associative array for the values. Thus, ${keys[$value]} is the name of the indexed array that holds the keys for that value. Then X is set up to be the variable "access phrase" for the collection of values, allowing eval echo "$X" to translate into those values with space separation. For example, if a value has indexed array IX2, then X will be the string ${IX2[@]}.

I believe zsh is similar in not supporting arrays of arrays, so it'd probably require a similar solution. IMHO though, the access phrases in zsh are slightly clearer.