Hadoop Mapreduce

Hadoop & Mapreduce Examples

Table of Contents

Using Hadoop Mapreduce, First of all, start the Hadoop Cluster using the commands given below.

$HADOOP_HOME/sbin/start-dfs.sh
Hadoop & Mapreduce Examples
$HADOOP_HOME/sbin/start-yarn.sh
Hadoop & Mapreduce Examples

Check by typing jps in the terminal if all the Nodes are running.

Hadoop & Mapreduce Examples

Do you remember in the last article we looked at how a word counter works?

Hadoop & Mapreduce Examples

Using Hadoop Mapreduce Let’s implement the above.

You need to create three files.

  • Reduce.java
  • Map.java
  • WordCount.java

Reduce.java

package com.impetus.code.examples.hadoop.mapred.wordcount;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
int sum = 0;
while (values.hasNext())
{
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

Map.java

package com.impetus.code.examples.hadoop.mapred.wordcount;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

WordCount.java

package com.impetus.code.examples.hadoop.mapred.wordcount;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount
{
public static void main(String[] args) throws Exception
{
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}
}

Now you need to compile java files.

There are two ways to compile java files.

mvn clean install
Hadoop & Mapreduce Examples

Or run the following command.

javac -d . Map.java Reduce.java WordCount.java
Hadoop & Mapreduce Examples

If you used javac -d command then run the following command too.

jar cfm wordcounter.jar Manifest.txt com/impetus/code/examples/hadoop/mapred/wordcount/*.class
Hadoop & Mapreduce Examples

Now let’s create an input folder in HDFS.

Hdfs dfs -mkdir ~/wordcount/input

Now we are going to create two input files. 

sudo vi input_one

And put the following content inside it.

Hadoop & Mapreduce Examples

And another file.

sudo vi input_two
Hadoop & Mapreduce Examples

Using the command below move the file to HDFS file system

hdfs dfs -copyFromLocal input_one ~/wordcount/input/
Hadoop & Mapreduce Examples

Do the above for both input files.

Now check if both files have been moved.

hdfs dfs -ls ~/wordcount/input/
Hadoop & Mapreduce Examples

Using Hadoop Mapreduce Now run the map-reduce using the command given below.

$HADOOP_HOME/bin/hadoop jar wordcounter.jar /input /output

By running the below-given command you will be able to see the output.

bin/hadoop dfs -cat ~/wordcount/output/part-00000
Hadoop & Mapreduce Examples

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article
Subscribe
By pressing the Subscribe button, you confirm that you have read our Privacy Policy.
Need a Free Demo Class?
Join H2K Infosys IT Online Training
Enroll Free demo class