Cumulative sum of values in a column with same ID

Increment the count after printing the current line.

awk '{print $1, $2, sum[$1]; sum[$1] += $2}' file
1
1 2 0
1 2 2
1 4 4
1 6 8
2
2 1 0
2 2 1
2 3 3
2 4 6
3
3 1 0
3 5 1
3 9 6
3 11 15

This takes advantage of awk treating undefined variables as the empty string, or (in numeric context) as zero.

If you don't waant the incremental sum 0 printed, use

if ($2 != "") sum[$1] += $2

That seems like a needlessly complicated approach. At least for the example you show, which is nicely sorted, it is enough to do:

$ awk '{ if($1 in a){print $0,a[$1]}else{print} if($2){a[$1]+=$2;}}' file 
1     
1 2 
1 2   2
1 4   4
1 6   8
2     
2 1 
2 2   1
2 3  3
2 4   6
3     
3 1 
3 5    1
3 9   6
3 11 15

If you want to add a 0 for the second time you see an ID (your desired output isn't clear on this since you have done so for IDs 2 and 3, but not for ID 1), you can do:

$ awk '{ if($1 in a){print $0,a[$1]}else{a[$1]=0; print} if($2){a[$1]+=$2;}}' file
1     
1 2  0
1 2   2
1 4   4
1 6   8
2     
2 1  0
2 2   1
2 3  3
2 4   6
3     
3 1  0
3 5    1
3 9   6
3 11 15

$ awk 'NF == 1 { sum = 0 } NF > 1 { $(NF+1) = sum; sum += $2 }; 1' file
1
1 2 0
1 2 2
1 4 4
1 6 8
2
2 1 0
2 2 1
2 3 3
2 4 6
3
3 1 0
3 5 1
3 9 6
3 11 15

This resets the cumulative sum whenever there is only a single column. When there are more than one column, it adds the current sum as an extra column at the end before updating the sum. The current record, with or without an extra column added, is then unconditionally outputted (this is what the lone 1 does at the end).

This assumes that the file is sorted in such a way that each line with a single column precedes all lines over which a distinct cumulative sum should be computed. This is the way the data in the question is presented.

Tags:

Awk