what's the difference between "hadoop fs" shell commands and "hdfs dfs" shell commands?

Following are the three commands which appears same but have minute differences

  1. hadoop fs {args}
  2. hadoop dfs {args}
  3. hdfs dfs {args}

  hadoop fs <args>

FS relates to a generic file system which can point to any file systems like local, HDFS etc. So this can be used when you are dealing with different file systems such as Local FS, (S)FTP, S3, and others


  hadoop dfs <args>

dfs is very specific to HDFS. would work for operation relates to HDFS. This has been deprecated and we should use hdfs dfs instead.


  hdfs dfs <args>

same as 2nd i.e would work for all the operations related to HDFS and is the recommended command instead of hadoop dfs

below is the list categorized as hdfs commands.

  namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups

So even if you use hadoop dfs , it will look locate hdfs and delegate that command to hdfs dfs


enter image description here

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html

The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, WebHDFS, S3 FS, and others.

bin/hadoop fs <args>

All FS shell commands take path URIs as arguments. The URI format is scheme://authority/path. For HDFS the scheme is hdfs, and for the Local FS the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost).

Most of the commands in FS shell behave like corresponding Unix commands. Differences are described with each of the commands. Error information is sent to stderr and the output is sent to stdout.

If HDFS is being used,

hdfs dfs

is a synonym.


fs refers to any file system, it could be local or HDFS but dfs refers to only HDFS file system. So if you need to perform access/transfer data between different filesystem, fs is the way to go.

Tags:

Hadoop

Hdfs