How to count lines in a file on hdfs command?

Total number of files: hadoop fs -ls /path/to/hdfs/* | wc -l

Total number of lines: hadoop fs -cat /path/to/hdfs/* | wc -l

Total number of lines for a given file: hadoop fs -cat /path/to/hdfs/filename | wc -l


1. Number of lines of a mapper output file:

`~]$ hadoop fs -cat /user/cloudera/output/part-m-00000 | wc -l`

2. Number of lines of a text or any other file on hdfs:

`~]$ hadoop fs -cat /user/cloudera/output/abc.txt | wc -l`

3. Top (Header) 5 lines of a text or any other file on hdfs:

`~]$ hadoop fs -cat /user/cloudera/output/abc.txt | head -5`

4. Bottom 10 lines of a text or any other file on hdfs:

`~]$ hadoop fs -cat /user/cloudera/output/abc.txt | tail -10`

You cannot do it with a hadoop fs command. Either you have to write a mapreduce code with the logic explained in this post or this pig script would help.

A = LOAD 'file' using PigStorage() as(...);
B = group A all;
cnt = foreach B generate COUNT(A);

Makesure you have the correct extension for your snappy file so that pig could detect and read it.

Tags:

Hadoop