Monthly Archives: October 2015

OOZIE NOTES Workflow scheduler to manage hadoop and related jobs Developed first in Banglore by Yahoo DAG(Direct Acyclic Graph) Acyclic means a graph cannot have any loops and action members of the graph provide control dependency. Control dependency means a second job cannot run until a first action is completed […]

Oozie Notes

This post is about some notes on Zookeeper commands and scripts. This is mainly useful for Hadoop Admins and all commands are self explanotry. ZooKeeper is a distributed centralized co-ordination service Zookeeper addresses issues with distributed applications: Maintain configuration information (share config info across all nodes) Naming Service(allows one node […]

Zookeeper Commands

In Java development, a typical workflow involves restarting the server with every class change, and no one complains about it. That is a fact about Java development. We have worked like that since our first day with Java. But is Java class reloading that difficult to achieve? And could that problem […]

Advanced Java Class Tutorial: A Guide to Class Reloading

Below are a few Hadoop Real Time usecases with solutions. Usecase 1 Problem:- Data Description: This gives the information about the markets and the products available in different regions based on the seasons. You will find the below fields listed in that file. [crayon-5c701ff4dfaae370370379/] Problem Statement: Select any particular county […]

Hadoop Real Time Usecases with Solutions

Cassandra Cassandra is a distributed database from Apache that is highly scalable and designed to manage very large amounts of structured data. It provides high availability with no single point of failure. NoSQL The primary objective of a NoSQL database is to have simplicity of design, horizontal scaling, and finer […]

Cassandra Interview Cheat Sheet

Hive SQL Based Datawarehouse app built on top of hadoop(select,join,groupby…..) It is a platform used to develop SQL type scripts to do MapReduce operations. PARTITIONING Partition tables changes how HIVE structures the data storage *Used for distributing load horizantally ex: PARTITIONED BY (country STRING, state STRING); A subset of a […]

Hadoop and Hive Interview Cheat Sheet