Hadoop getmerge Command

In this part of the tutorial, we are going to discuss the Hadoop file system shell command getmerge. It is used to merge n number of files in the HDFS distributed file system and put it into a single file in the local file system. So, let’s start Hadoop getmerge Command.

Hadoop getmerge Command

Usage:

  1. hdfs dfs –getmerge [-nl] <src> <localdest>

Takes the src directory and native destination file as the input. Concatenates the file in the src and puts it into the local destination file. Possibly we can use –nl to add new line character at the end of each file. We can use the –skip-empty-file option to prevent unnecessary newline characters for empty files.



Example of getmerge command

  1. hdfs dfs -getmerge -nl /user/Cloudera/TestFiles Desktop /MergedFile.txt

Hadoop getmerge Command

Why Do We Use Hadoop getmerge Command?

The getmerge command in Hadoop is for merging files existing in the HDFS file system into a single file in the native file system.

The command is useful to download the output of the MapReduce job. It has various part-* files into a single local file. We can use this native file later on for other operations like putting it in an excel file for presentation and so on.

Conclusion

We conclude that getmerge is a very helpful HDFS file system shell command. In practice, we can use it to merge the output of the MapReduce program into a native file.