Hadoop MapReduce Join & Counter with Example

Sometimes we need to combine two large datasets for this purpose MapReduce provides join operation. If we try to do the join manually, it requires a lot of code. MapReduce provides easy functionality, MapReduce Join and Counter having Two datasets are compared for size, and a smaller dataset is distributed to every DataNode. Then, The Reducer or Mapper uses the smaller dataset and manages it to perform lookup operations to find records. Lastly, the matching records from smaller and large datasets are merged to create the output joined records.

There are two types of joins.

Map-side Join
Reduce-side Join

Map-side Join

In the Map-side Join, the operation is performed by the mapper. Here, the Join is performed before the actual map function could consume the data. This type of Join has the prerequisite that it requires the input given to the map to be in the form of a partition, and all such inputs should be in the sorted order. The joining key must sort the equal sections.

Reduce-side Join

In the Reduce-side Join, the operation is performed by the reducer. In reduce-side join, the dataset is not expected to be in the form of structure. The map side joins processing produces the join key and the associated similar tuples from both of the records. Hence, all the tuples that have the same key group into the same reducer, they are joined to form the output records.

Let’s start with Hadoop first.

First of all, start the Hadoop Cluster using the commands given below.

$HADOOP_HOME/sbin/start-dfs.sh

Hadoop MapReduce Join & Counter with Example

$HADOOP_HOME/sbin/start-yarn.sh

Check by typing jps in the terminal if all the Nodes are running.

We have the following data

Download the Github repo from the link given below. We will be using those files.

https://github.com/mrcreamio/Hadoop-tutorials

Move the downloaded file to the respective repository using the command given below.

sudo cp -r /home/ahmed/Desktop/MapReduceJoin /home/supper_user/

Move to the respective directory.

cd MapReduceJoin/

Now let’s copy our input files to the HDFS.

hdfs dfs -copyFromLocal DeptStrength.txt DeptName.txt /

Let’s check if we have the files copied.

hdfs dfs -ls /

Run the program using the command given below.

$HADOOP_HOME/bin/hadoop jar MapReduceJoin.jar /DeptStrength.txt /DeptName.txt /output_mapreducejoin

Let’s see the output files using the command given below.

Here is the output.

hdfs dfs -cat /output_mapreducejoin/part-00000

Hadoop MapReduce, Map-side Join, Reduce-side Join

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article

What is AWS DevSecOps? A Beginner’s Guide

April 22, 2025

Discover the Key Requirements for Agile Scrum Certification

April 22, 2025

From Beginner to Pro: Your 2025 Guide to Agile Scrum Certification

April 21, 2025

What Are the Basics of Salesforce Training for Certification?

April 18, 2025

Everything You’ll Learn in Agile and Scrum Training Courses

April 18, 2025

What are some free online courses for a scrum master?

April 17, 2025

Need a Free Demo Class?

Join H2K Infosys IT Online Training

Enroll Now

How to Become a Big Data Engineer?

August 13, 2024

Best Hadoop Certifications: Boost Your Data Skills

August 2, 2024

Cracking The Data Engineer Interview

August 1, 2024

Ecosystem & Components of Hadoop

July 3, 2024

Big Data Career Opportunities in 2024

June 20, 2024

Who is a Hadoop Developer?

May 24, 2024

Who is a Big Data Analyst

May 16, 2024

Top Big Data Companies in 2024

April 16, 2024

Why Learn Big Data in 2024?

April 8, 2024

Is Big Data a Database

April 4, 2024

Steven Roger

Steven Roger is a technology blogger for the H2K Infosys blog, where he brings complex tech concepts to life with clear, engaging insights. With a passion for IT education and over a decade of industry experience, Steven specializes in demystifying the latest in software development, business analysis, and quality assurance training. His articles provide readers with practical knowledge and tips on upskilling for successful careers in tech.

Read All from Steven Roger