Bash Script: count unique lines in file

You can use the uniq command to get counts of sorted repeated lines:

sort ips.txt | uniq -c

To get the most frequent results at top (thanks to Peter Jaric):

sort ips.txt | uniq -c | sort -bgr

This is the fastest way to get the count of the repeated lines and have them nicely printed sored by the least frequent to the most frequent:

awk '{!seen[$0]++}END{for (i in seen) print seen[i], i}' ips.txt | sort -n

If you don't care about performance and you want something easier to remember, then simply run:

sort ips.txt | uniq -c | sort -n

PS:

sort -n parse the field as a number, that is correct since we're sorting using the counts.


To count the total number of unique lines (i.e. not considering duplicate lines) we can use uniq or Awk with wc:

sort ips.txt | uniq | wc -l
awk '!seen[$0]++' ips.txt | wc -l

Awk's arrays are associative so it may run a little faster than sorting.

Generating text file:

$  for i in {1..100000}; do echo $RANDOM; done > random.txt
$ time sort random.txt | uniq | wc -l
31175

real    0m1.193s
user    0m0.701s
sys     0m0.388s

$ time awk '!seen[$0]++' random.txt | wc -l
31175

real    0m0.675s
user    0m0.108s
sys     0m0.171s

Tags:

Bash