Introduction to Apache Pig
Just like MapReduce, Apache Pig is used to analyze big data sets. It is designed to deliver an abstraction over MapReduce, decreasing the complexity of writing a MapReduce program as a MapReduce program that requires Python or Java Knowledge. Apache Pig helps in performing data manipulation operations very quickly in Hadoop.
Pig Architecture
Pig consists of two components:
- JVM for running PigLatin.
- Pig Latin, which is a programming language
A Pig Latin program comprises a sequence of procedures or modifications applied to the input data to create output. These operations describe a data flow which is translated into an executable representation, by Pig execution environment.
These transformations provide a level of abstraction that hides a series of MapReduce jobs. This abstraction allows the programmer to focus on data instead of lengthy codes.
PigLatin is a moderately strengthened language that uses friendly keywords from data processing, e.g., Join, Group, and Filter.
Pig has two execution modes:
- Local mode: In local mode, Pig runs on JVM and uses the localhost. This mode is appropriate only for the testing on small datasets using Pig.
- Map Reduce mode: In MapReduce mode, queries written in Pig Latin programming language are rephrased into MapReduce jobs and run on a Hadoop cluster. For running Pig for large datasets, MapReduce mode is used.
How to Download and Install Pig
Download the pig from the link given below
https://downloads.apache.org/pig/pig-0.16.0/ |
Now move the downloaded file to the supper_user
Now extract the content in the folder using the command given below.
sudo tar -xvf pig-0.16.0.tar.gz pig-0.16.0/ |
Open the bashrc file using the command below.
~/.bashrc |
And do the following modifications.
Now run the following command.
. ~/.bashrc |
Now we need to compile PIG. Run the following commands.
cd $PIG_HOME |
Install ANT.
sudo apt-get install ant |
Recompile the PIG
sudo ant clean jar-all -Dhadoopversion=23 |
Check if the PIG is installed using the following command
pig -help |