Map Reduce Tutorial

MapReduce Job Optimization Techniques

MapReduce Job Optimization Techniques: In this tutorial, we are going to discuss all those techniques for MapReduce Job optimizations. In this MapReduce tutorial, we will offer you 6 important tips …

Counters in MapReduce

Counters in MapReduce: In this article, we are going to discuss Hadoop Counters in detail. We will study what is MapReduce Counters, What are their roles. And, we will also …

InputSplit vs Blocks in Hadoop MapReduce

InputSplit vs Blocks in Hadoop MapReduce: In this MapReduce tutorial, we will study the comparison between MapReduce InputSplit vs Blocks in Hadoop. Initially, we will see what is HDFS data …

Data Locality in Hadoop MapReduce

In this tutorial, we are going to discuss the concept of Data locality in Hadoop. Initially, we will see the introduction to MapReduce Data Locality in Hadoop, then we will …

Map Only job in Hadoop MapReduce

In this tutorial, we are going to discuss a very interesting topic i.e. Map Only job in Hadoop MapReduce. Initially, we will take a brief introduction of the Map and …

OutputFormat in MapReduce

In our earlier tutorial, we have learnt about InputFormat. Now in this tutorial, we are going to discuss the OutputFormat in MapReduce. We will discuss OutputFormat in Hadoop MapReduce, What …

MapReduce Shuffling and Sorting

In this lesson, we will learn completely about MapReduce Shuffling and Sorting. Here we will offer you a detailed description of the Hadoop Shuffling and Sorting phase. Initially, we will …

Combiner in Hadoop MapReduce

Combiner in Hadoop MapReduce: Initially, we will see what is MapReduce Combiner, which is the key role of Combiner in MapReduce. Then we will discuss the example of a MapReduce …

MapReduce Partitioner in Hadoop

In this tutorial, we are going to discuss the Partitioner in Hadoop MapReduce. What is Hadoop Partitioner, what is the necessity of Partitioner in Hadoop, What is the default Partitioner …

RecordReader in MapReduce

In this tutorial, we are going to study the RecordReader in Hadoop MapReduce. We will discuss the introduction to Hadoop RecordReader, working on RecordReader. We will also study the types …

InputSplit in Hadoop MapReduce

Now we are going to study InputSplit in Hadoop MapReduce. Here, we will discuss what is Hadoop InputSplit, the need of InputSplit in MapReduce. We will also debate how these …

MapReduce InputFormat

In this tutorial, we are going to cover the other component of the MapReduce process i.e. Hadoop MapReduce InputFormat. We will discuss What is InputFormat in Hadoop, What functionalities are …

In this tutorial, we are going to learn complete introduction to MapReduce Key-Value Pair. Initially, we will discuss what is a key-value pair in Hadoop?, How key-value pair is created in MapReduce?. Finally, we will explain MapReduce’s key-value pair generation with examples.

Key Value Pairs in MapReduce

What is Key Value Pair in Hadoop MapReduce?

The key-value pair in MapReduce is the record entity that Hadoop MapReduce accepts for execution.

We use Hadoop mainly for data analysis. It deals with structured, unstructured, and semi-structured data. With Hadoop, if the schema is static we can precisely work on the column in the place of key value. However, if the schema is not static we will work on a key value.

Keys value is not the essential properties of the data. But they are chosen by users evaluating the data.

MapReduce is the core component of Hadoop, which offers data processing. It performs processing by dividing the job into two phases: The map phase and Reduce phase. Each phase has key-value as input and output.

MapReduce Key value pair generation in Hadoop

In MapReduce job execution, before sending data to the mapper, first change it into key-value pairs. Because of mapper only key-value pairs of data.

Key-value pair in MapReduce is created as follows:

InputSplit – It is the logical representation of data that InputFormat generates. The MapReduce program describes a unit of work that consists of a single map task.

RecordReader – It interacts with the InputSplit. After that, it transforms the data into key-value pairs suitable for reading by the Mapper. RecordReader by default uses TextInputFormat to transform data into key-value pairs.

In MapReduce job execution, the map function handles a certain key-value pair. Then produces a certain number of key-value pairs. The Reduce function handles the values grouped by the same key. Then produces another set of key-value pairs as the output. The Map output types should match the input types of the Reduce as given below:

• Map: (K1, V1) -> list (K2, V2)

• Reduce: {(K2, list (V2}) -> list (K3, V3)

On what basis is a key-value pair generated in MapReduce?

MapReduce Key-value pair generation completely depends on the data set. It also depends on the required output. The framework specifies the key-value pairs in 4 places: Map input/output, Reduce input/output.

Map Input

Map Input by default acquires the line offset as the key. The content of the line is value as Text. We can change them; by using the custom input format.

Map Output

The Map is responsible to filter the data. It also offers the environment to group the data based on the key.

• Key– It is field/ text/ object on which the data groups and combines on the reducer.

• Value– It is the field/ text/ object which every individual reduces method handles.

Reduce Input

Map output is input to reduce. Hence it’s the same as Map-Output.

Reduce Output

It completely depends on the required output.

MapReduce Key-value Pair Example

For assumption, the content of the file which HDFS stores are Chandler is Joey Mark is John. Therefore, now by using InputFormat, we will define how this file will divide and read. By default, RecordReader uses TextInputFormat to transform this file into a key-value pair.

• Key – It is offset by the beginning of the line within the file.

• Value – It is the subject of the line, excluding line terminators.

Here, Key is 0 and Value is Chandler is Joey Mark is John.

Conclusion

In conclusion, we can say that key-value is just a record entity that MapReduce receives for execution. InputSplit and RecordReader create Key-value pairs. Therefore, the key is byte offset and value is the subject of the line.