Get text-file word occurrence count of all words & print output sorted

I would use tr instead of awk:

echo "Lorem ipsum dolor sit sit amet et cetera." | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr
  • tr just replaces spaces with newlines
  • grep -v "^\s*$" trims out empty lines
  • sort to prepare as input for uniq
  • uniq -c to count occurrences
  • sort -bnr sorts in numeric reverse order while ignoring whitespace

wow. it turned out to be a great command to count swear-per-lines

find . -name "*.py" -exec cat {} \; | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr | grep fuck


  1. Split the input into words, one per line.
  2. Sort the resulting list of words (lines).
  3. Squash multiple occurences.
  4. Sort by occurrence count.

To split the input into words, replace any character that you deem to be a word separator by a newline.

<input_file \
tr -sc '[:alpha:]' '[\n*]' | # Add digits, -, ', ... if you consider
                             # them word constituents
sort |
uniq -c |
sort -nr

Not using grep and awk but this seems to do what you want:

for w in `cat maxwell.txt`; do echo $w; done|sort|uniq -c
  2 a
  1 A
  1 an
  1 command
  1 considered
  1 domain-specific
  1 for
  1 interpreter,
  2 is
  1 language.
  1 line
  1 of

Tags:

Sort