Sum duplicate row values with awk

Use an Awk as below,

awk '{ seen[$1] += $2 } END { for (i in seen) print i, seen[i] }' file1
1486113768 9936
1486113769 6160736
1486113770 5122176
1486113772 4096832
1486113773 9229920
1486113774 8568888

{seen[$1]+=$2} creates a hash-map with the $1 being treated as the index value and the sum is incremented only for those unique items from $1 in the file.

$ awk '$1!=p{ if (NR>1) print p, s; p=$1; s=0} {s+=$2} END{print p, s}' file
1486113768 9936
1486113769 6160736
1486113770 5122176
1486113772 4096832
1486113773 9229920
1486113774 8568888

The above uses almost no memory (just 1 string and 1 integer variables) and will print the output in the same order it appeared in your input.

I highly recommend you read the book Effective Awk Programming, 4th Edition, by Arnold Robbins if you're going to be using awk both so you can learn how to write your own scripts and (while you're learning) so you can understand other peoples scripts well enough to separate the right from the wrong approaches given 2 scripts that produce the expected output given some specific sample input.

If datamash is okay

$ datamash -t' ' -g 1 sum 2 < ip.txt 
1486113768 9936
1486113769 6160736
1486113770 5122176
1486113772 4096832
1486113773 9229920
1486113774 8568888

-t' ' set space as field delimiter
-g 1 group by 1st field
sum 2 sum 2nd field values
if the input file is not sorted, use datamash -st' ' -g 1 sum 2 where the -s option takes care of sorting

Sum duplicate row values with awk

Tags:

Bash

Awk

Related

Recent Posts