Hive


Hive SQL Based Datawarehouse app built on top of hadoop(select,join,groupby…..) It is a platform used to develop SQL type scripts to do MapReduce operations. PARTITIONING Partition tables changes how HIVE structures the data storage *Used for distributing load horizantally ex: PARTITIONED BY (country STRING, state STRING); A subset of a […]

Hadoop and Hive Interview Cheat Sheet


This post is about basic String Functions in Hive with syntax and examples. Creating Table in HIVE: [crayon-58d621ed44ec4360748303/] String Functions and Normal Queries: [crayon-58d621ed44ecc018009399/] ASCII ASCII Function converts the first character of the string into its numeric ASCII value. [crayon-58d621ed44ecf954480219/]  CONCAT The CONCAT function concatenates all the strings/columns. [crayon-58d621ed44ed2157320629/] [crayon-58d621ed44ed5106527076/] [crayon-58d621ed44ed7981446798/] […]

String Functions in Hive


1
Creating Table in HIVE : [crayon-58d621ed459fb858641256/] Aggregated Functions and Normal Queries: [crayon-58d621ed45a02744070596/] SUM Returns the sum of the elements in the group or sum of the distinct values of the column in the group. [crayon-58d621ed45a06006532025/] Count count(*) – Returns the total number of retrieved rows, including rows containing NULL values; […]

Hive Aggregate Functions



2
HIVE Date Functions from_unixtime: This function converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a STRING that represents the TIMESTAMP of that moment in the current system time zone in the format of “1970-01-01 00:00:00”. The following example returns the current date including the time. [crayon-58d621ed46557208555300/] […]

Hive Date Functions


2
Hive Functions Examples SET SHOW USE CREATE DATABASE CREATE MANAGED TABLE CREATE EXTERNAL TABLE CREATING TABLE FROM EXISTING TABLE CREATING EXTERNAL TABLES FROM MANAGED TABLES LOAD COPY DATA FROM ONE TABLE TO ANOHTER DROP QUIT SELECT DESCRIBE DESCRIBE SPECIFIC FIELD DESCRIBE EXTENDED ALTER CLONE SCHEMA (DATA IS NOT COPIED) CLONE […]

Hive Functions Examples


ORCFile 5
In our previous post we have discussed about hadoop job optimization or Hadoop Performance Tuning for Mapreduce jobs. In this post we will briefly discuss a few points on how to optimize hive queries/ Hive Performance tuning. If we do not fine tune Hive properly, then even for select queries […]

Hive Performance Tuning



1
For data intensive workloads, I/O operation and network data transfer will take considerable time to complete. By Enabling Compression in Hive we can improve the performance Hive Queries and as well as save the storage space on HDFS cluster. Find Available Compression Codecs in Hive To enable compression in Hive, […]

Enable Compression in Hive


1
Hive Built In Functions Functions in Hive are categorized as below. Mathematical Functions: These functions mainly used to perform mathematical calculations. Date Functions: These functions are used to perform operations on date data types like adding the number of days to the date etc. String Functions: These functions are used […]

Hive Built In Functions


3
In this post, we will discuss about Hive Authorization Models and Hive security. Before discussing about Hive Authorization Models lets note the difference between authentication and authorization. Authentication – Verifying the identity of the user, whether the logged in user is real user or not. Authorization – Verifying whether a user […]

Hive Authorization Models and Hive Security



4
In this post, we will discuss about one of common hive clients, JDBC client for both HiveServer1 (Thrift Server) and HiveServer2. Use of HiveServer2 is recommended as HiveServer1 has several concurrency issues and lacks some features available in HiveServer2. JDBC Data Types The following table lists the data types implemented for HiveServer/HiveServer2 […]

Hive JDBC Client Example


2
In this post we will discuss about HiveServer2 Beeline Introduction. As of hive-0.11.0, Apache Hive started decoupling HiveServer2 from Hive. It is because of overcoming the existing Hive Thrift Server. Below are the Limitations of Hive Thrift Server 1 No Sessions/Concurrency Essentially need 1 server per client Security Client Interface […]

HiveServer2 Beeline Introduction





Percentage Table Sampling 14
In our previous post we have discussed about partitioning in Hive, now we will focus on Bucketing In Hive, which is another way of giving more fine grained structure to Hive tables. Bucketing in Hive Usually Partitioning in Hive offers a way of segregating hive table data into multiple files/directories. […]

Bucketing In Hive


14
In this post, we will discuss about one of the most critical and important concept in Hive, Partitioning in Hive Tables. Partitioning in Hive Table partitioning means dividing table data into some parts based on the values of particular columns like date or country, segregate the input records into different files/directories […]

Partitioning in Hive