Apache Hive

Apache Hive

Table of Contents

Hive is to one of the famous application of data warehouse system which can be used for structured data. It can be built  on the top of Apache Hive Hadoop. It was developed on Facebook. Hive has functionality of reading and ,managing large datasets which makes home in distributed storage. It runs SQL like queries called HQL which may get internally converted to MapReduce jobs.

We have following features of Apache Hive:

  • Hive is fast and scalable.
  • It will offer SQL – like queries which are implicitly transformed to MapReduce or Sparkjobs.
  • This is having ability to analyse large datasets stored in HDFS.
  • It offers different storage types like plain text RCFile  and HBase.
  • It will use indexing to accelerate queries.
  • It is functional of compressed data stored in the Hadoop ecosystem.
  • It supports user-defined functions as the user defines its functionality.

Limitations:

  • Hive will not be capable of handling real-time data.
  • It will not be designed for online transaction processing.
  • Hive queries will contain high latency.

Apache Hive architecture:

                                                                                                                                 

Apache Hive client:

Hive has applications are multilingual including Java, python and C++. It assists many different types of client such as

  • Thrift server- It is a process consists particular language service provider interface requests  from programming languages that assists thrift.
  • JDBC Driver- It may be used to build connection between hive and java application. The JDBC Driver will be present in the class org.apache.hadoop.hive.jdbc.HiveDriver.
  • ODBC Driver- It allows the applications that assists the ODBC protocol to connect to Hive.

Hive Services:

There are  many services that are provided by Hive:

  • Hive CLI- The Hive CLI(command Line Interface) is nothing but considered to be shell  which executes Hive queries as well as commands.
  • Hive web user Interface- The Hive Web UserInterface is having different source Hive CLI. It offers a web based GraphicalUserInterface for executing Hive queries and instructions.
  • Hive Metastore- It is central repository occupies structure information of various tables and partitions in the warehouse. It has metadata of column and its type information the serialisers and deserialisers that can be used to read and write data the corresponding HDFS files where the data is stored.
  • Hive server- We can consider this as Apache thrift puts request from different clients and offers it to the Hive driver.
  • Hive Driver- We have to run the queries from many platforms like web UI, CLI and JDBC driver. It moves queries to the compiler.
  • Hive compiler- It is always having an goal to parse the query and also will do semantic analysis on the different query blocks and expressions. It may convert HiveQL statements into MapReduce jobs.
  • HiveExecution engine- Optimiser will built he logical plan in form of DAG of map-decrease functions and HDFS tasks.

Hive data types

There are many data types that are categorised in many types, miscellaneous types and complex types

There are integer types:

  • TINYINT-This is an single byte signed integer ranges from -128 to 127.
  • SMALLINT-This 2 byte signed integer 32768 to 32 767
  • INT-4 byte signed integer which ranges from 2147483648 to 2147483647
  • Decimal data types
  • Float-This is capacity  of storing  4 byte single precision floating point number.
  • DOUBLE-This is datatype having capacity of  8 byte double precision floating point number.
  • Date/Time types
  • Timestamp

This will support traditional UNIX timestamp with optional nanosecond precision

This is a data integer type used as UNIX timestamp in seconds. As the floating point is interpreted as UNIX timestamp in seconds with the decimal precision. There are many datatypes like column types, Null types etc.

Questions

1. What is Apache Hive?

2. Explain the architecture of Apache Hive?

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article
Subscribe
By pressing the Subscribe button, you confirm that you have read our Privacy Policy.
Need a Free Demo Class?
Join H2K Infosys IT Online Training
Enroll Free demo class