Most Useful Apache Hadoop HDFS Commands

Hadoop Distributed File System (HDFS) is a highly fault-tolerant distributed file system that runs on commodity hardware. File System Shell includes several commands that directly interact with the HDFS and other file systems such as S3 File System, HFTP (Hadoop File System Implementation) File System which are also supported by HDFS. I will introduce some of the most useful Hadoop HDFS commands in this blog post.

Creating a directory in HDFS

We use mkdir command to create a directory in HDFS at a given path(s).

Syntax:

hdds dfs -mkdir <HDFS_PATH>

$hdfs dfs -mkdir /home/hduser/dir1 /home/hduser/dir2

List the content of Files or Directory in HDFS

We use the ls command to list the contents of a file or a directory in HDFS.

Syntax: hdfs dfs -ls <HDFS_PATH>

Example:

$hdfs dfs -ls /home/hduser

It returns the statistics in the below format for HDFS files,

permissions number_of_replicas userid groupid filesize modification_date modification_time filename

For HDFS Directories, it returns the statistics in the format below.

permissions userid groupid modification_date modification_time dirname

List only the file name in HDFS

To list only the file name in HDFS, using the -C as an argument.

hdfs dfs -ls -C /<hdfs_path>

Recursive command to list all directories, and subdirectories of Hadoop HDFS till the end.

$ hdfs dfs -ls -R /data/movies_data
-rw-r--r--   1 maria_dev hdfs    2893177 2018-11-04 22:09 /data/movies_data/movies_data.csv

Copy a file to HDFS from the Local path.

put Command

The below command copies a single src file, or multiple src files, from the local file system to the Hadoop data file system

Syntax: hadoop fs -put <local file system source> ... <HDFS_dest_Path>

$hadoop fs -put /home/hduser/HadoopJob/input/74-0.txt /user/hduser/input

If we are using Relative Path for the local system path, Use period or dot.

$hadoop fs -put ./HadoopJob/input/accesslogs.log /user/hduser/input

copyFromLocal Command

It copies a file from the local source to the HDFS destination

$hadoop fs -copyFromLocal <localsrc> URI   //Syntax
$hadoop fs -copyFromLocal /home/hduser/abc.txt  /home/hduser/abc.txt

Copy file from HDFS to local Path

get

Syntax: $hadoop fs -get <hdfs_source> <local_destination_path>`

$hadoop fs -get /home/hduser/dir3/file1/txt /home/

copyToLocal

Syntax: $hadoop fs -copyToLocal <hdfs_source> <local_destination_path>

[maria_dev@sandbox-hdp tutorials]$ hdfs dfs -copyToLocal /data/movies_data/movies_data.csv /home/maria_dev/tutorials 
[maria_dev@sandbox-hdp tutorials]$ls
movies_data.csv 

Move file from source to destination.

We can use the mv command to move the file from source to destination.

$hadoop fs -mv <src> <dest>   //Syntax

$hadoop fs -mv /home/hduser/dir2/abc.txt /home/hduser/dir2

Removing files and Directories in HDFS

We use the rm command to remove the files and directories in the Hadoop framework using the command line.

Syntax: $hadoop fs -rm <argument>

Files Remove files specified as the argument.

$hadoop fs -rm /home/hduser/dir1/abc.txt

Directories

$hadoop fs -rm -R /home/hduser/dir1/

Removing files and directories Recursively, version of delete.

It Deletes the directory only when it is empty Syntax: $hadoop fs -rm -R <HDFS_PATH>

$hadoop fs -rm -R /home/hduser/

Display the Last few lines of a file.

We can display a few lines of file Using the tail command of Unix

$hadoop fs -tail /home/hduser/dir1/abc.txt

See or Read the contents of a file

We can use the cat command to read or display the content of a file in console.

$hadoop fs -cat /home/hduser/dir1/abc.txt

Display the aggregate length or disk usage of a file or HDFS path

Syntax: hadoop fs -du /<Directory Path>

hadoop fs -du /home/hduser/dir1/abc.txt

Display the HDFS usage in Human Readable Format

Syntax: hdfs dfs -du -h

[maria_dev@sandbox-hdp ~]$ hdfs dfs -du -h  /data/retail_application
590      /data/retail_application/categories_dim
51.5 M   /data/retail_application/customer_addresses_dim
4.4 M    /data/retail_application/customers_dim
17.4 K   /data/retail_application/date_dim
7.4 M    /data/retail_application/email_addresses_dim
131.4 M  /data/retail_application/order_lineitems
69.4 M   /data/retail_application/orders
99       /data/retail_application/payment_methods
22.3 M   /data/retail_application/products_dim

Counts the no of directories, files, and bytes in a File Path

Syntax: hadoop fs -count <HDFS_FILE_PATH>

~$hadoop fs -count <HDFS_FILE_PATH> 
hdfs dfs -ls  <hdfs_directory_path>|wc -l

Empty the Trash

~$hadoop fs -expunge :Empty the trash 

Merges the HDFS Files into a single file at the local directory

When you work with a file system like HDFS Directory or use tools like Hive or Spark, your application can create

It takes a source directory and destination file as input and concatenates the file in src into the destination local file

~$hadoop fs -getmerge <HDFS source path>
             <Local file system Destination path >

Takes a source file and outputs the file in text format.

 ~$hadoop fs -text <Source Path>
 The allowed formats are zip & TextReadInput Stream

Creates a file of length Zero or size

~$hadoop fs -touchz <path>

Check if the File, path or Directory Exists

~$hadoop fs -test -ezd <pathname>
  hadoop fs -test -e <path>
  hadoop fs -test -z <pathname>
  hadoop fs -test -d <pathname>
  
 -e: checks to see if the file exists 
     return 0 if true
  -z:check to see if  the file is zero length 
      return if true
   -d: Checks and return 1 if path is directory 
      else 0   

Returns the stat information on Path

$hadoop fs -stat <local or HDFS path name>

Displaying Disk file system capability in terms of bytes

~$hadoop fs -df <Directory Path>

Disable the NameNode Safe mode

The below command is used to disable the safe node of NameNode and can be executed by only Hadoop Admin or Hadoop operation team.

sudo su hdfs -l -c 'hdfs dfsadmin -safemode leave'

Count Number of Lines in the HDFS File

hdfs dfs -cat </path_to_hdfs_directory/*> |wc -l

References

File System Guide