How to count the occurence of a pattern in a line

You simply want to add a column with the count of columns in it. This may be done using awk:

$ awk -F ',' '{ printf("%d,%s\n", NF, $0) }' data.in
3,Rv0729,Rv0993,Rv1408
4,Rv0162c,Rv0761c,Rv1862,Rv3086
1,Rv2790c

NF is an awk variable containing the number of fields (columns) in the current record (row). We print this number followed by a comma and the rest of the row, for each row.

An alternative (same result, but may look a bit cleaner):

$ awk -F ',' 'BEGIN { OFS=FS } { print NF, $0 }' data.in

FS is the field separator which awk uses to split each record into fields, and we set that to a comma with -F ',' on the command line (as in the first solution). OFS is the output field separator, and we set that to be the same as FS before reading the first line of input.

If you wanted to count the number of occurrences of the Rv[0-9]{4}c? pattern as opposed to the number of comma-delimited fields as the subject of your question suggests, you could do:

 awk '{print gsub(/Rv[0-9]{4}c?/, "&"), $0}'

A Perl approach:

$ perl -F, -pae 's/^/$#F+1 . ","/e' file
3,Rv0729,Rv0993,Rv1408  
4,Rv0162c,Rv0761c,Rv1862,Rv3086  
1,Rv2790c

The -a makes perl behave like awk and split each input line on the string given by -F and save the resulting fields into the array @F. Therefore, $#F will be the highest array index in @F and, since arrays start counting at 0, $#F+1 will be the total number of elements in the array. The -p means "print every input line after aplpying the script given by -e. The s/// is the substitution operator and here were are replacing the beginning of the line (^) with the number of fields + 1 and a comma ($#F+1 . ",").

How to count the occurence of a pattern in a line

Tags:

Text Processing

Related

Recent Posts