How to Install Hadoop with Step by Step Configuration on Ubuntu?

In this tutorial, we are going to install Hadoop Apache on a ubuntu system followed by the configuration of install Hadoop.

Open the terminal in Ubuntu. Hit Ctrl–Shift–T

Step 1) Add a new user.

To keep things clean we will create a new user with the name “hadoop”.

sudo addgroup hadoop

You will be asked for a password. Enter the password and press enter.

How to Install Hadoop with Step by Step Configuration on Ubuntu?

Then execute the following command to create user.

sudo adduser --ingroup hadoop supper_user

Then enter the information.

Note: Remember the password which you entered.

Step 2) Now Configure SSH

If the SSH is not installed in your system enter the given below command in the terminal.

sudo apt-get install openssh-server

Now add the supper_user to the sudo group.

sudo adduser supper_user sudo

First, convert to the supper_user that you created above using the command given below.

su supper_user

Enter the password which you set above while creating the user.

Now for creating the authentication key pairs for SSH execute the command below.

ssh-keygen -t rsa -P ""

Remember to execute this command from supper_user that we created.

Now activate the SSH access using the command given below.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Let’s now test our SSH using the following command.

ssh localhost

If you see the below-given output. Congratulations you are on your way.

Step 3) Download the Hadoop from the given website

Download Hadoop

I downloaded the Hadoop in the download folder. Now move to download folder using the command given below.

cd /home/ahmed/Downloads/

sudo tar xzf hadoop-3.3.0.tar.gz

sudo chown -R supper_user:hadoop hadoop-3.3.0

Step 4) Configuration of Hadoop

We are going to modify the following files.

bashrc
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site-xml
Yarn-site.xml

File 1:

sudo vi .bashrc
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Once you added the above-given variable in the file, exit from the editor.

Now apply changes using the given command below.

source ~/.bashrc

File 2:

Edit hadoop-env.sh File using the command given below.

sudo vi /home/ahmed/Downloads/hadoop-3.3.0/etc/hadoop/hadoop-env.sh

Uncomment the export statement add the path of Java.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

File 3:

Now lets edit core-site.xml file.

sudo vi $HADOOP_HOME/etc/hadoop/core-site.xml

Add the below-given data in the core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hdoop/tmpdata</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>

Now move to the given below directory

cd $HADOOP_HOME/etc/hadoop

sudo mkdir -p /app/hadoop/tmp

sudo chown -R supper_user:hadoop /app/hadoop/tmp

sudo chmod 750 /app/hadoop/tmp

File 4:

Now let’s edit hdfs-site.xml File

sudo vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following lines in the configuration tag.

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/supper_user/hdfs</value>
</property>
</configuration>

Save and exit.

File 5:

Now edit mapred-site.xml file.

sudo vi $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following data in the mapred-site.xml file.

<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:54311</value>
<description>MapReduce job tracker runs at this host and port.
</description>
</property>
</configuration>

File 6:

Now we are going to edit yarn-site.xml file.

sudo vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>   
  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

Save and exit.

Now let’s format HDFS NameNode using the command below.

$HADOOP_HOME/bin/hdfs namenode -format

$HADOOP_HOME/sbin/start-yarn.sh

Type “jps” to check the Hadoop processes are running.

Configuration on Ubuntu, How to Install Hadoop, Step by Step Configuration

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article

Discover the Key Requirements for Agile Scrum Certification

April 22, 2025

From Beginner to Pro: Your 2025 Guide to Agile Scrum Certification

April 21, 2025

What Are the Basics of Salesforce Training for Certification?

April 18, 2025

Everything You’ll Learn in Agile and Scrum Training Courses

April 18, 2025

What are some free online courses for a scrum master?

April 17, 2025

AWS DevSecOps Training Course Overview

April 17, 2025

Need a Free Demo Class?

Join H2K Infosys IT Online Training

Enroll Now

How to Become a Big Data Engineer?

August 13, 2024

Best Hadoop Certifications: Boost Your Data Skills

August 2, 2024

Cracking The Data Engineer Interview

August 1, 2024

Ecosystem & Components of Hadoop

July 3, 2024

Big Data Career Opportunities in 2024

June 20, 2024

Who is a Hadoop Developer?

May 24, 2024

Who is a Big Data Analyst

May 16, 2024

Top Big Data Companies in 2024

April 16, 2024

Why Learn Big Data in 2024?

April 8, 2024

Is Big Data a Database

April 4, 2024

Steven Roger

Steven Roger is a technology blogger for the H2K Infosys blog, where he brings complex tech concepts to life with clear, engaging insights. With a passion for IT education and over a decade of industry experience, Steven specializes in demystifying the latest in software development, business analysis, and quality assurance training. His articles provide readers with practical knowledge and tips on upskilling for successful careers in tech.

Read All from Steven Roger