Monthly Archives: May 2014


Hbase bashrc 4
This post describes the procedure for HBase Installation on Ubuntu Machine in pseudo distributed mode using HDFS configuration. Prerequisites:  Java is one of the main prerequisite. JDK 1.6 or later versions of Java installation is required to run HBase. Hadoop 1 or Hadoop 2 installed on pseudo distributed or fully […]

HBase Installation in Pseudo Distribution Mode


Hbase Tables
HBase is the Hadoop’s database and Below is the high level HBase Overview. HBase Overview: What is HBase ? HBase is a scalable distributed column oriented database built on top of Hadoop and HDFS. Apache HBase is open-source non-relational database implemented based on Google’s Big Table – A Distributed storage […]

HBase Overview


output 4
Use Case Description: This post describes an approach to use case scenario, where an input file contains some columns and its corresponding values as records. But some of these columns may have blanks/nulls instead of actual values. I.e. data is missing for some columns. And developer needs to write a […]

Mapreduce Program to calculate Missing Count



2
Hadoop Output Formats We have discussed input formats supported by hadoop in previous post. In this post, we will have an overview of the hadoop output formats and their usage. Hadoop provides output formats that corresponding to each input format. All hadoop output formats must implement the interface org.apache.hadoop.mapreduce.OutputFormat. OutputFormat […]

Hadoop Output Formats


2
Below are a few more hadoop mapreduce interview questions and answers for experienced and freshers hadoop developers. Hadoop Mapreduce Interview Questions and Answers for Experienced: 1.  After restart of namenode, Mapreduce jobs started failing which worked fine before restart. What could be the wrong ? The cluster could be in […]

Mapreduce Interview Questions and Answers for Experienced Part – 3


Below are a few more hadoop mapreduce interview questions and answers for experienced and freshers hadoop developers. Hadoop Mapreduce Interview Questions and Answers for Experienced: 1.  What is side data distribution in Mapreduce framework ? The extra read-only data needed by a mapreduce job to process the main data set […]

Mapreduce Interview Questions and Answers for Experienced Part – 2



Below are a few more hadoop interview questions and answers for freshers and experienced hadoop developers. Hadoop Interview questions and answers 1.  What is the default block size in HDFS ? As of Hadoop-2.4.0 release, the default block size in HDFS is 256 MB and prior to that it was […]

Hadoop Interview Questions and Answers Part – 4


Running Sample Mapreduce – Word Count Program in Eclipse This post is an extension of previous post about configuring eclipse for Hadoop. Once the configuration is done successfully, we can run the sample mapreduce programs in Eclipse IDE. In this Eclipse Mapreduce Example post, We will discuss the development of sample […]

Eclipse Mapreduce Example


output 10
Eclipse is a powerful IDE for java development. Since Hadoop and Mapreduce programming is done in java, it would be better to do our programming in a well-featured Integrated Development Environment (IDE). So, In this post, we are going to learn how to install eclipse on Ubuntu machine and configure it […]

Eclipse Configuration for Hadoop



version file2 1
Error Scenario: Incompatible clusterIDs When we receive the error messages in data node logs similar to below then it belongs to this error scenario. [crayon-5c17ed7fe9850863449603/] Root Cause: This error message is received when the cluster ID of name node and cluster ID of data node are different. We can see […]

Incompatible clusterIDs


1
Error Scenario:   storage directory does not exist or is not accessible / Exception in namenode join When we receive the error messages in name node log or data node logs similar to below then it belongs to this error scenario. [crayon-5c17ed7fea85e955515042/] [crayon-5c17ed7fea871307669966/] Root Cause: This kind of error messages are […]

storage directory does not exist


This post is written under the assumption that, an user reading this post already have an idea about installing and configuring hadoop on single node cluster. If not, it is better to go through the post Installing Hadoop on single node cluster In this post we will briefly discuss about installing […]

Install Hadoop on Multi Node Cluster