Getting average of values in a field based on variable in another field

You're not too far off. Try arrays indexed by $1:

awk '{ total[$1] += $3; count[$1]++ } END {for (t in total) print t, total[t]/count[t]}' file
a 2.5
b 9.66667

Or, if you want a maximum of two decimal points, as you show in your question:

$ awk '{ total[$1] += $3; count[$1]++ } END {for (t in total) printf "%s %.2f\n", t, total[t]/count[t]}' file
a 2.50
b 9.67

With GNU datamash:

$ datamash -R2 -W -s -g 1 mean 3 <file
a       2.50
b       9.67

$ datamash -W -s -g 1 mean 3 <file
a       2.5
b       9.6666666666667
  • -R2 round to two decimal places
  • -W use spaces and/or tabs as field separators
  • -s sort before grouping
  • -g 1 group on the first field
  • mean 3 mean of the values, third field

You can remove -W if you have tabs as separators add --output-delimiter=' ' if you want to replace tabs with space characters in the output.


Miller is also handy for tasks like this ex.

$ mlr --nidx stats1 -a mean -f 3 -g 1 file.txt
a 2.500000
b 9.666667

or (with a more recent version that has the format-values verb)

$ mlr --nidx stats1 -a mean -f 3 -g 1 then format-values -f '%.2f' file.txt
a 2.50
b 9.67