OutputFormat in MapReduce

In our earlier tutorial, we have learnt about InputFormat. Now in this tutorial, we are going to discuss the OutputFormat in MapReduce. We will discuss OutputFormat in Hadoop MapReduce, What is RecordWritter in MapReduce OutputFormat. We will also discuss the types of OutputFormat in MapReduce.

OutputFormat In MapReduce

Introduction to MapReduce OutputFormat

OutputFormat verifies the output specification for the execution of the Map-Reduce job. It describes how RecordWriter execution is used to write output to output files.

Before we begin with OutputFormat, let us first discuss what is RecordWriter, and what is the work of RecordWriter in MapReduce?

RecordWriter in Hadoop MapReduce

As we are aware, Reducer takes Mapper’s intermediate output as input. Then it operates a reducer function on them to generate output that is again zero or more key-value pairs.

Hence, RecordWriter in MapReduce job execution writes these output key-value pairs from the Reducer phase to output files.

MapReduce OutputFormat

RecordWriter gets output data from Reducer. Then it writes this data to output files. OutputFormat decides the way these output key-value pairs are written in output files by RecordWriter. The OutputFormat and InputFormat functions are the same. OutputFormat instances are utilized to write to files on the local disk or in HDFS. In MapReduce job execution is based on output specification;

• Hadoop MapReduce job verifies that the output directory does not already present.

• OutputFormat in MapReduce job offers the RecordWriter implementation to be used to write the output files of the job. Then the output files are stored in a FileSystem.

The framework utilizes FileOutputFormat.setOutputPath() method to set the output directory.

Types of OutputFormat in MapReduce

There are several types of OutputFormat which are as follows:

TextOutputFormat

The default OutputFormat is TextOutputFormat. It writes (key, value) pairs on single lines of text files. TextOutputFormat keys and values can be of any type. The reason behind this is that TextOutputFormat turns them to string by calling toString() on them. It divides the key-value pair by a tab character. By using MapReduce.output.textoutputformat.separator property we can also modify it.

KeyValueTextOutputFormat is also utilized for reading these output text files.

SequenceFileOutputFormat

This OutputFormat writes sequences files for its output. SequenceFileInputFormat is also transitional format use between MapReduce jobs. It serializes random data types to the file. And the corresponding SequenceFileInputFormat will deserialize the file into similar types. It presents the data to the next mapper in the same fashion as it was emitted by the previous reducer. Static methods also handle compression.

SequenceFileAsBinaryOutputFormat

It is another variant of SequenceFileInputFormat. It also writes keys and values to sequence files in binary format.

MapFileOutputFormat

It is another form of FileOutputFormat. It also writes output as map files. The framework includes a key in a MapFile in order. So we need to ensure that the reducer emits keys in sorted order.

MultipleOutputs

This format permits writing data to files whose names are derived from the output keys and values.

LazyOutputFormat

In MapReduce job execution, FileOutputFormat sometimes generates output files, even if they are empty. LazyOutputFormat is also a wrapper OutputFormat.

DBOutputFormat

It is the OutputFormat for writing to relational databases and HBase. This format also transfers the reduced output to a SQL table. It also allows key-value pairs. In this, the key has a type extending DBwritable.

Conclusion

Therefore, various OutputFormats are used according to the need. I hope you find this tutorial helpful.