How to list only the file names in HDFS

It seems hadoop ls does not support any options to output just the filenames, or even just the last column.

If you want get the last column reliably, you should first convert the whitespace to a single space, so that you can then address the last column:

hadoop fs -ls | sed '1d;s/  */ /g' | cut -d\  -f8

This will get you just the last column but files with the whole path. If you want just filenames, you can use basename as @rojomoke suggests:

hadoop fs -ls | sed '1d;s/  */ /g' | cut -d\  -f8 | xargs -n 1 basename

I also filtered out the first line that says Found ?x items

Note: beware that, as @felix-frank notes in the comments, that the above command will not correctly preserve file names with multiple consecutive spaces. Hence a more correct solution proposed by Felix:

hadoop fs -ls /tmp | sed 1d | perl -wlne'print +(split " ",$_,8)[7]'


The following command will return filenames only:

hdfs dfs -stat "%n" my/path/*

:added at Feb 04 '21

Actually last few years I use

hdfs dfs -ls -d my/path/* | awk '{print $8}'

and

hdfs dfs -ls my/path | grep -e "^-" | awk '{print $8}'

Tags:

Shell

Hadoop