In our earlier tutorial, we have learnt about InputFormat. Now in this tutorial, we are going to discuss the OutputFormat in MapReduce. We will discuss OutputFormat in Hadoop MapReduce, What is RecordWritter in MapReduce OutputFormat. We will also discuss the types of OutputFormat in MapReduce.
Introduction to MapReduce OutputFormat
OutputFormat verifies the output specification for the execution of the Map-Reduce job. It describes how RecordWriter execution is used to write output to output files.
Before we begin with OutputFormat, let us first discuss what is RecordWriter, and what is the work of RecordWriter in MapReduce?
RecordWriter in Hadoop MapReduce
As we are aware, Reducer takes Mapper’s intermediate output as input. Then it operates a reducer function on them to generate output that is again zero or more key-value pairs.
Hence, RecordWriter in MapReduce job execution writes these output key-value pairs from the Reducer phase to output files.
MapReduce OutputFormat
RecordWriter gets output data from Reducer. Then it writes this data to output files. OutputFormat decides the way these output key-value pairs are written in output files by RecordWriter. The OutputFormat and InputFormat functions are the same. OutputFormat instances are utilized to write to files on the local disk or in HDFS. In MapReduce job execution is based on output specification;
• Hadoop MapReduce job verifies that the output directory does not already present.
• OutputFormat in MapReduce job offers the RecordWriter implementation to be used to write the output files of the job. Then the output files are stored in a FileSystem.
The framework utilizes FileOutputFormat.setOutputPath() method to set the output directory.
Types of OutputFormat in MapReduce
There are several types of OutputFormat which are as follows:
TextOutputFormat
The default OutputFormat is TextOutputFormat. It writes (key, value) pairs on single lines of text files. TextOutputFormat keys and values can be of any type. The reason behind this is that TextOutputFormat turns them to string by calling toString() on them. It divides the key-value pair by a tab character. By using MapReduce.output.textoutputformat.separator property we can also modify it.
KeyValueTextOutputFormat is also utilized for reading these output text files.
SequenceFileOutputFormat
This OutputFormat writes sequences files for its output. SequenceFileInputFormat is also transitional format use between MapReduce jobs. It serializes random data types to the file. And the corresponding SequenceFileInputFormat will deserialize the file into similar types. It presents the data to the next mapper in the same fashion as it was emitted by the previous reducer. Static methods also handle compression.
SequenceFileAsBinaryOutputFormat
It is another variant of SequenceFileInputFormat. It also writes keys and values to sequence files in binary format.
MapFileOutputFormat
It is another form of FileOutputFormat. It also writes output as map files. The framework includes a key in a MapFile in order. So we need to ensure that the reducer emits keys in sorted order.
MultipleOutputs
This format permits writing data to files whose names are derived from the output keys and values.
LazyOutputFormat
In MapReduce job execution, FileOutputFormat sometimes generates output files, even if they are empty. LazyOutputFormat is also a wrapper OutputFormat.
DBOutputFormat
It is the OutputFormat for writing to relational databases and HBase. This format also transfers the reduced output to a SQL table. It also allows key-value pairs. In this, the key has a type extending DBwritable.
Conclusion
Therefore, various OutputFormats are used according to the need. I hope you find this tutorial helpful.