How to remove unique strings from a textfile?

Using awk:

$ awk 'seen[$0]++; seen[$0] == 2' file
Happy sad
Happy sad
Happy sad
Happy sad
Happy sad
Mad happy
Mad happy

This uses the text of each line as the key into the associative array seen. The first seen[$0]++ will cause a line that has been seen before to be printed since the value associated with the line will be non-zero on the second and subsequent times the line is seen. The seen[$0] == 2 causes the line to be printed again if this is the second time the line has been seen (without this, you'll miss one occurrence of each duplicated line).

This is related to awk '!seen[$0]++' which is sometimes used to remove duplicates without sorting (see e.g. How does awk '!a[$0]++' work?).


To only get one copy of the duplicated lines:

awk 'seen[$0]++ == 1' file

or,

sort file | uniq -d

If the duplicates may not be contiguous and you need to preserve the order in the input, you could do it with awk and two passes, one to count the number of occurrences and one to print the lines that have been seen to occur more than once in the first pass:

awk 'second_pass {if (c[$0] > 1) print; next}
     {c[$0]++}' file.txt second_pass=1 file.txt

From man uniq:

-D print all duplicate lines

You can achieve your goal like so:

uniq -D file.txt