Count total number of occurrences using grep

grep's -o will only output the matches, ignoring lines; wc can count them:

grep -o 'needle' file | wc -l

This will also match 'needles' or 'multineedle'.

To match only single words use one of the following commands:

grep -ow 'needle' file | wc -l
grep -o '\bneedle\b' file | wc -l
grep -o '\<needle\>' file | wc -l

If you have GNU grep (always on Linux and Cygwin, occasionally elsewhere), you can count the output lines from grep -o: grep -o needle | wc -l.

With Perl, here are a few ways I find more elegant than yours (even after it's fixed).

perl -lne 'END {print $c} map ++$c, /needle/g'
perl -lne 'END {print $c} $c += s/needle//g'
perl -lne 'END {print $c} ++$c while /needle/g'

With only POSIX tools, one approach, if possible, is to split the input into lines with a single match before passing it to grep. For example, if you're looking for whole words, then first turn every non-word character into a newline.

# equivalent to grep -ow 'needle' | wc -l
tr -c '[:alnum:]' '[\n*]' | grep -c '^needle$'

Otherwise, there's no standard command to do this particular bit of text processing, so you need to turn to sed (if you're a masochist) or awk.

awk '{while (match($0, /set/)) {++c; $0=substr($0, RSTART+RLENGTH)}}
     END {print c}'
sed -n -e 's/set/\n&\n/g' -e 's/^/\n/' -e 's/$/\n/' \
       -e 's/\n[^\n]*\n/\n/g' -e 's/^\n//' -e 's/\n$//' \
       -e '/./p' | wc -l

Here's a simpler solution using sed and grep, which works for strings or even by-the-book regular expressions but fails in a few corner cases with anchored patterns (e.g. it finds two occurrences of ^needle or \bneedle in needleneedle).

sed 's/needle/\n&\n/g' | grep -cx 'needle'

Note that in the sed substitutions above, I used \n to mean a newline. This is standard in the pattern part, but in the replacement text, for portability, substitute backslash-newline for \n.


If, like me, you actually wanted "both; each exactly once", (this is actually "either; twice") then it's simple:

grep -E "thing1|thing2" -c

and check for the output 2.

The benefit of this approach (if exactly once is what you want) is that it scales easily.

Tags:

Grep