Hadoop Mapreduce

Hadoop & Mapreduce Examples

Using Hadoop Mapreduce, First of all, start the Hadoop Cluster using the commands given below.

Check by typing jps in the terminal if all the Nodes are running.

Do you remember in the last article we looked at how a word counter works?

Using Hadoop Mapreduce Let’s implement the above.

You need to create three files.

  • Reduce.java
  • Map.java
  • WordCount.java


package com.impetus.code.examples.hadoop.mapred.wordcount;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
int sum = 0;
while (values.hasNext())
sum += values.next().get();
output.collect(key, new IntWritable(sum));


package com.impetus.code.examples.hadoop.mapred.wordcount;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
output.collect(word, one);


package com.impetus.code.examples.hadoop.mapred.wordcount;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount
public static void main(String[] args) throws Exception
JobConf conf = new JobConf(WordCount.class);




FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));



Now you need to compile java files.

There are two ways to compile java files.

mvn clean install
Or run the following command.

javac -d . Map.java Reduce.java WordCount.java
If you used javac -d command then run the following command too.

jar cfm wordcounter.jar Manifest.txt com/impetus/code/examples/hadoop/mapred/wordcount/*.class
Now let’s create an input folder in HDFS.

Hdfs dfs -mkdir ~/wordcount/input

Now we are going to create two input files. 

sudo vi input_one

And put the following content inside it.

And another file.

sudo vi input_two
Using the command below move the file to HDFS file system

hdfs dfs -copyFromLocal input_one ~/wordcount/input/
Do the above for both input files.

Now check if both files have been moved.

hdfs dfs -ls ~/wordcount/input/
Using Hadoop Mapreduce Now run the map-reduce using the command given below.

$HADOOP_HOME/bin/hadoop jar wordcounter.jar /input /output

By running the below-given command you will be able to see the output.

bin/hadoop dfs -cat ~/wordcount/output/part-00000
