Hadoop Pig Tutorial

Hadoop Pig Tutorial: What is, Architecture, Example

Table of Contents

Introduction to Apache Pig

Just like MapReduce, Apache Pig is used to analyze big data sets. It is designed to deliver an abstraction over MapReduce, decreasing the complexity of writing a MapReduce program as a MapReduce program that requires Python or Java Knowledge. Apache Pig helps in performing data manipulation operations very quickly in Hadoop.

Pig Architecture

Pig consists of two components:

  1. JVM for running PigLatin.
  2. Pig Latin, which is a programming language

A Pig Latin program comprises a sequence of procedures or modifications applied to the input data to create output. These operations describe a data flow which is translated into an executable representation, by Pig execution environment.

These transformations provide a level of abstraction that hides a series of MapReduce jobs. This abstraction allows the programmer to focus on data instead of lengthy codes.  

PigLatin is a moderately strengthened language that uses friendly keywords from data processing, e.g., Join, Group, and Filter.

Hadoop Pig Tutorial: What is, Architecture, Example

Pig has two execution modes:

  1. Local mode: In local mode, Pig runs on JVM and uses the localhost. This mode is appropriate only for the testing on small datasets using Pig.
  2. Map Reduce mode: In MapReduce mode, queries written in Pig Latin programming language are rephrased into MapReduce jobs and run on a Hadoop cluster. For running Pig for large datasets, MapReduce mode is used.

How to Download and Install Pig

Download the pig from the link given below

https://downloads.apache.org/pig/pig-0.16.0/
Hadoop Pig Tutorial: What is, Architecture, Example

Now move the downloaded file to the supper_user

Hadoop Pig Tutorial: What is, Architecture, Example

Now extract the content in the folder using the command given below.

sudo tar -xvf pig-0.16.0.tar.gz pig-0.16.0/

Open the bashrc file using the command below.

~/.bashrc

And do the following modifications.

Hadoop Pig Tutorial: What is, Architecture, Example

Now run the following command.

. ~/.bashrc
Hadoop Pig Tutorial: What is, Architecture, Example

Now we need to compile PIG. Run the following commands.

cd $PIG_HOME
Hadoop Pig Tutorial: What is, Architecture, Example

Install ANT.

sudo apt-get install ant
Hadoop Pig Tutorial: What is, Architecture, Example

Recompile the PIG

sudo ant clean jar-all -Dhadoopversion=23

Check if the PIG is installed using the following command

pig -help

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article
Subscribe
By pressing the Subscribe button, you confirm that you have read our Privacy Policy.
Need a Free Demo Class?
Join H2K Infosys IT Online Training
Enroll Free demo class