Counters in MapReduce: In this article, we are going to discuss Hadoop Counters in detail. We will study what is MapReduce Counters, What are their roles. And, we will also discuss the types of Counters in Hadoop MapReduce. For example MapReduce Task Counter, File System Counters, FileInputFormat Counters, FileOutputFormat counters, Job Counters in MapReduce, Dynamic Counters in Hadoop.
Hadoop MapReduce
Before we start with Hadoop Counters, let us first discuss what is Hadoop MapReduce?
MapReduce is the data processing layer of Hadoop. It handles large structured and unstructured data stored in HDFS. MapReduce also handles a huge amount of data in parallel. It does this by splitting the job (submitted job) into a set of independent tasks (sub-job). In Hadoop, MapReduce works by dividing the processing into phases: Map and Reduce.
• Map Phase- It is the first phase of the data process. In this phase, we state all the complex logic/business rules/costly code.
• Reduce Phase- Reduce Phase is the second phase of processing. In this phase, we state light-weight processing like aggregation/summation.
What are Hadoop Counters?
Counters in Hadoop are a beneficial channel for collecting statistics about the MapReduce job. Like for quality control or application-level. Counters are also beneficial for problem diagnosis.
A Counter signifies Apache Hadoop global counters, defined either by the MapReduce framework. Every counter in MapReduce is named by an “Enum”. It also has a long for the value.
Hadoop Counters confirm that:
• It reads and writes the exact number of bytes.
• It has launched and successfully run the correct number of tasks or not.
• Counters also authorize that the amount of CPU and memory consumed is correct for our job and cluster nodes or not.
Types of Counters in MapReduce
2 types of MapReduce counters are:
• Built-in Counters
• User-Defined Counters/Custom counters
Built-in Counters in Hadoop MapReduce
Apache Hadoop provides some built-in counters for every job. These counters report several metrics. There are counters for the number of bytes and records. This permits us to confirm that the expected amount of input is consumed and the expected amount of output is produced.
Hadoop Counters are also divided into groups. There are various groups of built-in counters. Every group also either contains task counters or contains a job counter.
Different groups of the built-in counters in Hadoop are as follows:
MapReduce Task Counter
The task counter gathers specific information about tasks during its execution time. Which consists of the number of records reads and writes.
For assumption, the MAP_INPUT_RECORDS counter is the Task Counter. It also counts the input records read by every map task.
File System Counters
This Counter collects information like several bytes read and written by the file system. The name and description of the file system counters are as below:
• FileSystem bytes read– The number of bytes read by the filesystem.
• FileSystem bytes written– The number of bytes written to the filesystem.
FileInputFormat Counters
These Counters also collects information on several bytes read by map tasks via FileInputFormat.
FileOutputFormat counters
These counters also collect information on several bytes written by map tasks (for map-only jobs) or reduce tasks via FileOutputFormat.
Job Counters in MapReduce
Job counter measures the job-level statistics. It does not measure values that change while a task is running. For assumption, TOTAL_LAUNCHED_MAPS, count the number of map tasks that were launched throughout a job. The application master also measures Job counters. Therefore they don’t need to be sent across the network, unlike all other counters, including user-defined ones.
User-Defined Counters or Custom Counters in Hadoop MapReduce
In addition to built-in counters, Hadoop MapReduce allows user code to define a set of counters. Then it increases them as desired in the mapper or reducer. Similarly in Java to define counters it uses, ‘enum’.
A job may define a random number of ‘enums’. Each with a random number of fields. The name of the enum is the group name. The enum’s fields are the counter names.
Dynamic Counters in Hadoop
Java enum’s fields are defined at compile time. Therefore we cannot generate new counters at run time using enums. Therefore, we use dynamic counters to generate new counters at run time. But the dynamic counter is not defined at compile time.
Conclusion
Therefore, Counters verify whether it has read and written the correct number of bytes. Counter also calculates the progress or the number of operations that occur within a MapReduce job. Hadoop also provides built-in counters and user-defined counters to calculate the progress that occurs within a MapReduce job.