Below is the hive level architecture of Hive:
In Hive distribution, we can find the below components majorly.
- CLI — Command Line Interface. It is the most common way of interacting with Hive. (Hive shell) This is the default service.
- HWI — Hive Web Interface. It is an alternative to the shell for interacting with hive through web browser.
- JDBC/ODBC/Thrift Server — These are provide programmatic access to Hive server. Applications using Thrift, JDBC, and ODBC connectors need to run a Hive server to communicate with Hive. HIVE_PORT environment variable need to be specified with the available port(defaults to 10,000) number to let the server listen on.
- Driver — Driver compiles the input commands and queries, optimizes the
computation required, and executes the required steps with MapReduce jobs.
- Metastore — The metastore is the central repository of Hive metadata. The metastore is divided into two pieces: a service and the backing store for the data. By default, the metastore is run in the same process as the Hive service. Using this
service, it is possible to run the metastore as a standalone (remote) process. Set the
METASTORE_PORT environment variable to specify the port the server will listen on
- Job Tracker — Hive communicates with the Job Tracker to initiate the MapReduce jobs. Hive does not have to be running on the same master node with the JobTracker.
- Namenode — The data files to be processed are in HDFS, which is managed by the NameNode
Below are the three main clients that can interact with Hive Architecture.
Thrift Client: Hive Thrift Client can run Hive commands from a wide range of programming languages. Thrift bindings for Hive are available for Java, Python, and Ruby.
JDBC Driver: Hive provides a JDBC driver, defined in the class
org.apache.hadoop.hive.jdbc.HiveDriver. When configured with a JDBC URI of
the form jdbc:hive://host:port/dbname , a Java application will connect to a Hive
server running in a separate process at the given host and port.
We may alternatively choose to connect to Hive through JDBC in embedded mode using the URI jdbc:hive://. In this mode, Hive runs in the same JVM as the application
invoking it, so there is no need to launch it as a standalone server since it does not
use the Thrift service or the Hive Thrift Client.
ODBC Driver: Hive ODBC Driver allows applications that support the ODBC protocol to connect to Hive. Like the JDBC driver, the ODBC driver uses Thrift to communicate with the Hive server.