Count number of similar lines in a file

You're file's "structure" is a bit lacking in the structure department, so you'll have to deal with some errors in the process.

Assuming you have all that in a file called input, try:

tr '[A-Z]' '[a-z]' < input | \
     egrep -v "^ *(join date|age|posts|location|re):" | \
     sort | \
     uniq -c

First line lowercases everything, second strips out the things that look like email headers in your sample, then sort and count unique items.


This command lists the lines and the number of times to repeat

sort nameFile | uniq -c 

How about using awk for this -

awk '
/:/||/^$/{next}{a[toupper($0)]++}
END{for(i in a) print i,a[i]}' INPUT_FILE

Explanation:

First we identify lines that has : in them or are blank and ignore them. All other lines gets stored are converted to upper case and stored in an array. In our END statement we print out everything in our array and the number of times it was found.

Test:

awk '
/:/||/^$/{next}{a[toupper($0)]++}
END{for(i in a) print i,a[i]}' file1
SOX 1
CHRISTMAS SONG 1
CUP OF WONDER 1
SOSSITY YER A WOMAN 1
FAT MAN 1
PUSSY WILLOW 1
VELVET GREEN 1
WITH YOU THERE TO HELP ME 1
ELEGY 1
WE USED TO KNOW 1
TEACHER 1
MY SUNDAY FEELING 1
SWEET DREAM 1
JACK-A-LYNN 1
SOMETHING'S ON THE MOVE 1
ROVER 1
DUN RINGILL 2
AVOIDING THE SWAN SONG 1
JACK FROST AND THE HOODED CROW 1
WITCHES PROMISE 1
LIFE'S A LONG SONG 2
LIVING IN THE PAST 1
WITCH'S PROMISE 1
WOW !!!! WHERE DO I START ? 1
SKATING AWAY ON THE THIN ICE OF A NEW DAY 1
MINSTRAL IN THE GALLERY 1
RAINBOW BLUES 1
MOTHER GOOSE 1
HEAVY HORSES 1
AQUALUNG 1
LOCOMOTIVE BREATH 1

Tags:

Bash