Unix command to check if any two lines in a file are same?

Here is one way to get the exact output you're looking for:

$ grep -nFx "$(sort sentences.txt | uniq -d)" sentences.txt 
1:This is sentence X
4:This is sentence X

Explanation:

The inner $(sort sentences.txt | uniq -d) lists each line that occurs more than once. The outer grep -nFx looks again in sentences.txt for exact -x matches to any of these lines -F and prepends their line number -n

Not exactly what you want, but you can try combining sort and uniq -c -d:

aularon@aularon-laptop:~$ cat input
This is sentence X
This is sentence Y
This is sentence Z
This is sentence X
This is sentence A
This is sentence B

aularon@aularon-laptop:~$ sort input | uniq -cd
      2 This is sentence X
aularon@aularon-laptop:~$

2 here is the number of duplications found for the line, from man uniq:

   -c, --count
          prefix lines by the number of occurrences

   -d, --repeated
          only print duplicate lines

IF the file contents fit in memory awk is good for this. The standard one-liner in comp.lang.awk (I can't search an instance from this machine but there's several every month) to just detect there is duplication is awk 'n[$0]++' which counts the occurrences of each line value and prints any occurrence(s) other than the first, because the default action is print $0.

To show all occurrences including the first, in your format, but possibly in mixed order when more than one value is duplicated, gets a little more finicky:

awk <sentences.txt ' !($0 in n) {n[$0]=NR;next} \
    n[$0] {n[$0]=0; print "Line "n[$0]":"$0} \
    {print "Line "NR":"$0} '

Shown in multiple lines for clarity, you usually run together in real use. If you do this often you can put the awk script in a file with awk -f, or of course the whole thing in a shell script. Like most simple awk this can be done very similarly with perl -n[a].

Unix command to check if any two lines in a file are same?

Tags:

Search

Text Processing

Related

Recent Posts