Apache Hive

Hive is to one of the famous application of data warehouse system which can be used for structured data. It can be built on the top of Apache Hive Hadoop. It was developed on Facebook. Hive has functionality of reading and ,managing large datasets which makes home in distributed storage. It runs SQL like queries called HQL which may get internally converted to MapReduce jobs.

We have following features of Apache Hive:

Hive is fast and scalable.
It will offer SQL – like queries which are implicitly transformed to MapReduce or Sparkjobs.
This is having ability to analyse large datasets stored in HDFS.
It offers different storage types like plain text RCFile and HBase.
It will use indexing to accelerate queries.
It is functional of compressed data stored in the Hadoop ecosystem.
It supports user-defined functions as the user defines its functionality.

Limitations:

Hive will not be capable of handling real-time data.
It will not be designed for online transaction processing.
Hive queries will contain high latency.

Apache Hive architecture:

Apache Hive client:

Hive has applications are multilingual including Java, python and C++. It assists many different types of client such as

Thrift server- It is a process consists particular language service provider interface requests from programming languages that assists thrift.
JDBC Driver- It may be used to build connection between hive and java application. The JDBC Driver will be present in the class org.apache.hadoop.hive.jdbc.HiveDriver.

ODBC Driver- It allows the applications that assists the ODBC protocol to connect to Hive.

Hive Services:

There are many services that are provided by Hive:

Hive CLI- The Hive CLI(command Line Interface) is nothing but considered to be shell which executes Hive queries as well as commands.
Hive web user Interface- The Hive Web UserInterface is having different source Hive CLI. It offers a web based GraphicalUserInterface for executing Hive queries and instructions.
Hive Metastore- It is central repository occupies structure information of various tables and partitions in the warehouse. It has metadata of column and its type information the serialisers and deserialisers that can be used to read and write data the corresponding HDFS files where the data is stored.
Hive server- We can consider this as Apache thrift puts request from different clients and offers it to the Hive driver.
Hive Driver- We have to run the queries from many platforms like web UI, CLI and JDBC driver. It moves queries to the compiler.
Hive compiler- It is always having an goal to parse the query and also will do semantic analysis on the different query blocks and expressions. It may convert HiveQL statements into MapReduce jobs.
HiveExecution engine- Optimiser will built he logical plan in form of DAG of map-decrease functions and HDFS tasks.

Hive data types

There are many data types that are categorised in many types, miscellaneous types and complex types

There are integer types:

TINYINT-This is an single byte signed integer ranges from -128 to 127.
SMALLINT-This 2 byte signed integer 32768 to 32 767
INT-4 byte signed integer which ranges from 2147483648 to 2147483647
Decimal data types
Float-This is capacity of storing 4 byte single precision floating point number.
DOUBLE-This is datatype having capacity of 8 byte double precision floating point number.
Date/Time types
Timestamp

This will support traditional UNIX timestamp with optional nanosecond precision

This is a data integer type used as UNIX timestamp in seconds. As the floating point is interpreted as UNIX timestamp in seconds with the decimal precision. There are many datatypes like column types, Null types etc.

Questions

1. What is Apache Hive?

2. Explain the architecture of Apache Hive?

One Response

Pingback: Best Hadoop Certifications: Boost Your Data Skills

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article

What Are the Basics of Salesforce Training for Certification?

April 18, 2025

Everything You’ll Learn in Agile and Scrum Training Courses

April 18, 2025

What are some free online courses for a scrum master?

April 17, 2025

AWS DevSecOps Training Course Overview

April 17, 2025

Scrum Master Certification Online: What You Need to Know Before Enrolling

April 14, 2025

Unlock Opportunities: Top Benefits of a DevOps Course

April 14, 2025

Need a Free Demo Class?

Join H2K Infosys IT Online Training

Enroll Now

How to Become a Big Data Engineer?

August 13, 2024

Best Hadoop Certifications: Boost Your Data Skills

August 2, 2024

Cracking The Data Engineer Interview

August 1, 2024

Ecosystem & Components of Hadoop

July 3, 2024

Big Data Career Opportunities in 2024

June 20, 2024

Who is a Hadoop Developer?

May 24, 2024

Who is a Big Data Analyst

May 16, 2024

Top Big Data Companies in 2024

April 16, 2024

Why Learn Big Data in 2024?

April 8, 2024

Is Big Data a Database

April 4, 2024

Steven Roger

Steven Roger is a technology blogger for the H2K Infosys blog, where he brings complex tech concepts to life with clear, engaging insights. With a passion for IT education and over a decade of industry experience, Steven specializes in demystifying the latest in software development, business analysis, and quality assurance training. His articles provide readers with practical knowledge and tips on upskilling for successful careers in tech.

Read All from Steven Roger