Combiner in Hadoop MapReduce: Initially, we will see what is MapReduce Combiner, which is the key role of Combiner in MapReduce. Then we will discuss the example of a MapReduce program with and without combiner in Hadoop MapReduce. Finally, we will also see some advantages and disadvantages of Combiner in MapReduce.
What is Combiner in Hadoop MapReduce?
Combiner is also known as “Mini-Reducer” that synopsises the Mapper output record with the same Key before passing to the Reducer.
On a huge dataset when we run a MapReduce job. So Mapper creates large chunks of intermediate data. Then the framework passes this intermediate data on the Reducer for further handling. This leads to huge network congestion. The Hadoop framework offers a function known as Combiner that plays a key role in reducing network congestion.
The main job of Combiner a “Mini-Reducer is to handle the output data from the Mapper, before passing it to Reducer. It works after the mapper and before the Reducer. Its usage is optional.
How does Combiner work in Hadoop MapReduce?
Now let us discuss how things change when we use the combiner in MapReduce?
As we see in the above diagram no combiner is there. Input is divided into two mappers. The framework generates 9 keys from the mappers.
So, now we have (9 key/value) intermediate data. Further mapper forwards this key-value directly to the reducer. While sending data to the reducer, it utilizes some network bandwidth. It takes more time to transfer data to reducer if the size of the data is big.
Now from the above picture, if we use a combiner in between mapper and reducer. Then combiner will shuffle 9 key/value before forwarding it to the reducer. And then creates 4 key/value pair as an output.
Now, Reducer needs to handles only 4 key/value pair data which are generated from 2 combiners. Hence reducer gets executed only 4 times to produce the final output. Therefore, this increases the overall performance.
Advantages of Combiner in MapReduce
Study the benefits of Hadoop Combiner in MapReduce.
• Use of combiner decreases the time taken for data transfer between mapper and reducer.
• Combiner increases the overall performance of the reducer.
• It reduces the amount of data that the reducer has to process.
Disadvantages of Combiner in MapReduce
There are also some disadvantages to Hadoop Combiner. Let’s now study the same.
• In the native filesystem, when Hadoop stores the key-value pairs and runs the combiner later this will result in expensive disk IO.
• MapReduce jobs can’t rely on the combiner execution as there is no guarantee in its execution.
Therefore, Hadoop Combiner plays a crucial role in reducing network congestion. It increases the overall performance of the reducer by summarizing the output of Mapper. I hope now you have a clear knowledge of Hadoop Combiner.