Yearly Archives: 2015


3
HCatalog and Pig Integration In short, HCatalog opens up the hive metadata to other mapreduce tools. Every mapreduce tools has its own notion about HDFS data (example Pig sees the HDFS data as set of files, Hive sees it as tables). With having table based abstraction, HCatalog supported mapreduce tools […]

HCatalog and Pig Integration


In this post, we will describe the procedure for Scala Installation on Ubuntu Machine. Prerequisite: Below are the basic requirement for Scala installation on Ubuntu and getting started. Java 1.5 or Later versions installed and JAVA_HOME environment variable set to Java installation directory Scala Installation Procedure: Download the latest stable […]

Scala Installation in Ubuntu


These days majority of the hadoop applications/tools are being built in Scala Programming language than in Java. So it is good for hadoop developers/Java programmers to learn Scala as well. This post is just an introduction to Scala. What’s Scala? Scala is a hybrid functional and object-oriented programming language which […]

Introduction to Scala



Kafka Architecture
In this post we will discuss about Kafka Installation and Test Broker Setup in Ubuntu machine in a stand-alone zookeeper mode. Apache Kafka: Open Source Message Broker from Apache Software Foundation.  Initially developed at LinkedIn and later contributed to Open-source community. It is written in Scala. Kafka Provides a unified, […]

Kafka Installation and Test Broker Setup


2
HBase Functions Cheat Sheet SHELL [cloudera@quickstart ~]$ hbase shell LIST hbase(main):003:0> list SCAN Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS, CACHE. If no columns are specified, all columns will be scanned. To scan all members of a column family, leave […]

HBase Functions Cheat Sheet


Good SQL Queries Collection /* CREATE TABLE FOR DEPARTMENT */ CREATE TABLE DEPARTMENT ( “DEPT_ID” NUMBER, “DEPT_NAME” VARCHAR2(30), PRIMARY KEY (“DEPT_ID”) ) /* CREATE TABLE FOR EMP */ CREATE TABLE EMPLOYEE ( “EMP_ID” NUMBER, “MGR_ID” NUMBER, “DEPT_ID” NUMBER, “EMP_NAME” VARCHAR2(30), “SAL” NUMBER, “DOJ” DATE, PRIMARY KEY (“EMP_ID”) ENABLE, FOREIGN KEY […]

Good SQL Queries Collection



Real Time Hadoop Interview Questions from Various interviews Hive – Where do you use Internal or Managed table? What scenarios? In your resume, what do you mean by, “monitoring & managing MapReduce jobs”? Explain? Interviewer’s Project: How to modify the RDBMs’ Nested SQL queries into Hadoop framework using Pig. Sqoop: […]

Real Time Hadoop Interview Questions From Different Readers


Hadoop Testing Tools MRUnit  – Java framework that helps developers unit test Hadoop Map reduce jobs. Mockito –  Java Framework, similar to MRUnit for unit testing Hadoop Map reduce jobs. PigUnit – Java framework that helps developers unit test Pig Scripts. HiveRunner – An Open Source unit test framework for hadoop hivequeries based […]

Hadoop Testing Tools


This post is about basic String Functions in Hive with syntax and examples. Creating Table in HIVE: [crayon-5db048ebce381193797135/] String Functions and Normal Queries: [crayon-5db048ebce389602269281/] ASCII ASCII Function converts the first character of the string into its numeric ASCII value. [crayon-5db048ebce38c029710557/]  CONCAT The CONCAT function concatenates all the strings/columns. [crayon-5db048ebce390567531002/] [crayon-5db048ebce393648305313/] [crayon-5db048ebce396852239428/] […]

String Functions in Hive



3
Deciding How to Pass Arguments to Oozie Jobs So far, you have learned about several ways to pass parameters to an Oozie job. To help you decide which approach to use, you should first understand how Oozie uses parameters: Oozie uses parameters explicitly defined inside an action’s <arg> tag. If […]

Passing arguments to Oozie workflows


1
Creating Table in HIVE : [crayon-5db048ebcf283849134366/] Aggregated Functions and Normal Queries: [crayon-5db048ebcf28c170854936/] SUM Returns the sum of the elements in the group or sum of the distinct values of the column in the group. [crayon-5db048ebcf290524250540/] Count count(*) – Returns the total number of retrieved rows, including rows containing NULL values; […]

Hive Aggregate Functions


2
HIVE Date Functions from_unixtime: This function converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a STRING that represents the TIMESTAMP of that moment in the current system time zone in the format of “1970-01-01 00:00:00”. The following example returns the current date including the time. [crayon-5db048ebd030d709435696/] […]

Hive Date Functions




2
Hive Functions Examples SET SHOW USE CREATE DATABASE CREATE MANAGED TABLE CREATE EXTERNAL TABLE CREATING TABLE FROM EXISTING TABLE CREATING EXTERNAL TABLES FROM MANAGED TABLES LOAD COPY DATA FROM ONE TABLE TO ANOHTER DROP QUIT SELECT DESCRIBE DESCRIBE SPECIFIC FIELD DESCRIBE EXTENDED ALTER CLONE SCHEMA (DATA IS NOT COPIED) CLONE […]

Hive Functions Examples


Below is one of the good collection of examples for most frequently used functions in Pig. Pig Functions Examples. Contents LOAD DESCRIEBE/EXPLAIN/ILLUSTRATE FOREACH GROUP STORE LIMIT ORDER DISTINCT JOIN JOIN USING MULTIPLE KEYS OUTER JOINS SELF JOIN COUNT NUMBER OF ROWS IN SELF JOIN’S OUTPUT SAMPLE PARALLEL UDF:REGISTER UDF:DEFINE CALLING […]

Pig Functions Examples