Monthly Archives: May 2015


2
Hive Functions Examples SET SHOW USE CREATE DATABASE CREATE MANAGED TABLE CREATE EXTERNAL TABLE CREATING TABLE FROM EXISTING TABLE CREATING EXTERNAL TABLES FROM MANAGED TABLES LOAD COPY DATA FROM ONE TABLE TO ANOHTER DROP QUIT SELECT DESCRIBE DESCRIBE SPECIFIC FIELD DESCRIBE EXTENDED ALTER CLONE SCHEMA (DATA IS NOT COPIED) CLONE […]

Hive Functions Examples


Below is one of the good collection of examples for most frequently used functions in Pig. Pig Functions Examples. Contents LOAD DESCRIEBE/EXPLAIN/ILLUSTRATE FOREACH GROUP STORE LIMIT ORDER DISTINCT JOIN JOIN USING MULTIPLE KEYS OUTER JOINS SELF JOIN COUNT NUMBER OF ROWS IN SELF JOIN’S OUTPUT SAMPLE PARALLEL UDF:REGISTER UDF:DEFINE CALLING […]

Pig Functions Examples


2
Below is the Pig Functions Cheat Sheet prepared by collecting different types of functions. Pig Execution Modes Grunt mode: It is interactive mode of pig. Very useful for testing syntax checking and ad-hoc data exploration Script mode: Run set of instructions from a file Similar to sql script file Embedded […]

Pig Functions Cheat Sheet



1
Testing Flume Scope Testing will cover the functional testing of the data transfer from source machines (External Systems) to HDFS/HBase. Testing of Individual Flume Components like, different Source types, Channel Types and Sink Types will be included. Testing of Custom Flume Agents/Embedded flume agents in other automated jobs/tools. Limitations & […]

Flume Sqoop Pig HBase Unit Testing


In Our previous posts we have seen HBase Overview and HBase Installation, now it is the time to practice some Hbase Shell Commands to get familiarize with HBase. We will test a few Hbase shell commands in this post. HBase Shell Usage Quote all names in HBase Shell such as […]

HBase Shell Commands in Practice


2
1. What does commodity Hardware in Hadoop world mean? ( D ) a) Very cheap hardware b) Industry standard hardware c) Discarded hardware d) Low specifications Industry grade hardware 2. Which of the following are NOT big data problem(s)? ( D) a) Parsing 5 MB XML file every 5 minutes […]

100 Interview Questions on Hadoop



ORCFile 5
In our previous post we have discussed about hadoop job optimization or Hadoop Performance Tuning for Mapreduce jobs. In this post we will briefly discuss a few points on how to optimize hive queries/ Hive Performance tuning. If we do not fine tune Hive properly, then even for select queries […]

Hive Performance Tuning


1
For data intensive workloads, I/O operation and network data transfer will take considerable time to complete. By Enabling Compression in Hive we can improve the performance Hive Queries and as well as save the storage space on HDFS cluster. Find Available Compression Codecs in Hive To enable compression in Hive, […]

Enable Compression in Hive