Hadoop MapReduce Join & Counter

Hadoop MapReduce Join & Counter with Example

Table of Contents

Sometimes we need to combine two large datasets for this purpose MapReduce provides join operation. If we try to do the join manually, it requires a lot of code. MapReduce provides easy functionality, MapReduce Join and Counter having Two datasets are compared for size, and a smaller dataset is distributed to every DataNode. Then, The Reducer or Mapper uses the smaller dataset and manages it to perform lookup operations to find records. Lastly, the matching records from smaller and large datasets are merged to create the output joined records.

There are two types of joins.

  • Map-side Join
  • Reduce-side Join

Map-side Join

In the Map-side Join, the operation is performed by the mapper. Here, the Join is performed before the actual map function could consume the data. This type of Join has the prerequisite that it requires the input given to the map to be in the form of a partition, and all such inputs should be in the sorted order. The joining key must sort the equal sections.

Reduce-side Join

In the Reduce-side Join, the operation is performed by the reducer. In reduce-side join, the dataset is not expected to be in the form of structure.  The map side joins processing produces the join key and the associated similar tuples from both of the records. Hence, all the tuples that have the same key group into the same reducer, they are joined to form the output records.

Let’s start with Hadoop first.

First of all, start the Hadoop Cluster using the commands given below.

$HADOOP_HOME/sbin/start-dfs.sh
Hadoop MapReduce Join & Counter with Example
$HADOOP_HOME/sbin/start-yarn.sh
Hadoop MapReduce Join & Counter with Example

Check by typing jps in the terminal if all the Nodes are running.

Hadoop MapReduce Join & Counter with Example

We have the following data 

Hadoop MapReduce Join & Counter with Example
Hadoop MapReduce Join & Counter with Example

Download the Github repo from the link given below. We will be using those files.

https://github.com/mrcreamio/Hadoop-tutorials

Move the downloaded file to the respective repository using the command given below.

sudo cp -r /home/ahmed/Desktop/MapReduceJoin /home/supper_user/

Move to the respective directory.

cd MapReduceJoin/
Hadoop MapReduce Join & Counter with Example

Now let’s copy our input files to the HDFS.

hdfs dfs -copyFromLocal DeptStrength.txt DeptName.txt /
Hadoop MapReduce Join & Counter with Example

Let’s check if we have the files copied.

hdfs dfs -ls /
Hadoop MapReduce Join & Counter with Example

Run the program using the command given below.

$HADOOP_HOME/bin/hadoop jar MapReduceJoin.jar /DeptStrength.txt /DeptName.txt /output_mapreducejoin
Hadoop MapReduce Join & Counter with Example

Let’s see the output files using the command given below.

Hadoop MapReduce Join & Counter with Example

Here is the output.

hdfs dfs -cat /output_mapreducejoin/part-00000
Hadoop MapReduce Join & Counter with Example

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article
Subscribe
By pressing the Subscribe button, you confirm that you have read our Privacy Policy.
Need a Free Demo Class?
Join H2K Infosys IT Online Training
Enroll Free demo class