Yearly Archives: 2014


1
Below are a few important Hadoop HBase Interview Questions and Answers that are suitable for hadoop freshers or experienced developers. 1. What is HBase? HBase is Column-Oriented , Open-Source, Multidimensional, Distributed database. It run on the top of HDFS. 2. Why do we use HBase? HBase provide random read and […]

HBase Interview Questions and Answers Part – 1


Hunk Visualization 3
In this post we will discuss about the configuration required for Hive connectivity with Hunk, Hadoop flavor of Splunk, the famous visualization tool. Splunk Overview: Splunk tool captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, dashboards and visualizations. Splunk released a product […]

Hive Connectivity With Hunk (Splunk)


Sqlline 4
Phoenix HBase Overview What is Apache Phoenix? Apache phoenix is another Top Level project from Apache Software Foundation. It provides an SQL interface to HBase. It is like an SQL layer on top of HBase architecture. It maps HBase data model to the relational world. Phoenix is developed in java […]

Apache Phoenix – An SQL Layer on HBase



hive tez2 1
In this post, we will discuss about Hive integration with Tez framework or Enabling Tez for Hive Queries. And we will also run sample hive queries both on Mapreduce and Tez frameworks and we will evaluate the performance difference between Tez and MR Frameworks. Tez Advantages: Tez offers a customizable […]

Hive on Tez – Hive Integration with Tez


Tez Dag job output 4
Apache Tez Overview What is Apache Tez? Apache Tez is another execution framework project from Apache Software Foundation and it is built on top of Hadoop YARN. It is considered as a more flexible and powerful successor of the mapreduce framework. Apache Tez Features: Tez provides, Performance gain over Map Reduce […]

Apache Tez – Successor of Mapreduce Framework


Hive Table Mapping with HBase 18
In this post, we will discuss about the setup needed for HBase Integration with Hive and we will test this integration with the creation of some test hbase tables from hive shell and populate the contents of it from another hive table and finally verify these contents in hbase table. […]

HBase Integration with Hive



JSON View of avro file
This post is a continuation for previous post on working with small files issue. In previous we have merged huge number of small files on HDFS directory into sequencefile and in this post we will merge huge number of small files on local file system into avro file on HDFS […]

Merging Small Files Into Avro File


SequenceFile Key Extractor 7
In this post, we will discuss one of the famous use case of SequenceFiles, where we will merge large number of small files into SequenceFile. We will get to this requirement mainly due to the lack efficient processing of large number of small files in hadoop or mapreduce. Need For […]

Merging Small Files into SequenceFile


Error Scenario: java.io.IOException: Cannot create an instance of InputFormat class We will get this error message when we try to execute simple hadoop fs commands or running any hive queries. Below is the complete error message. [crayon-5dd1a581895d3982165434/] Root Cause: This error message will be received when there are any spaces […]

Cannot create an instance of InputFormat



hadoop fs text command
This post is continuation for previous post on hadoop sequence files. In this post we will discuss about Reading and Writing SequenceFile Examples using Apache Hadoop 2 API. Writing Sequence File Example: As discussed in the previous post, we will use static method SequenceFile.createWriter(conf, opts) to create SequenceFile.Writer instance and […]

Reading and Writing SequenceFile Example


Sequence File Format 2
In addition to text files, hadoop also provides support for binary files. Out of these binary file formats, Hadoop Sequence Files are one of the hadoop specific file format that stores serialized key/value pairs. In this post we will discuss about basic details and format of hadoop sequence files examples. Hadoop […]

Hadoop Sequence Files example




Output in JSON format
Avro provides support for both old Mapreduce Package API (org.apache.hadoop.mapred) and new Mapreduce Package API (org.apache.hadoop.mapreduce). Avro data can be used as both input and output from a MapReduce job, as well as the intermediate format. In this post we will provide an example run of Avro Mapreduce 2 API. This […]

Avro MapReduce 2 API Example


4
In this post, we will discuss about another 50 Mapreduce Interview Questions and Answers for experienced mapreduce developers. Mapreduce Interview Questions and Answers for experienced 1. What are the methods in the Mapper class and order of their invocation? [crayon-5dd1a58191702431551254/] The Mapper contains the run() method, which call its own […]

Mapreduce Interview Questions and Answers Part – 4


1
Below are some of the hadoop interview questions and answers. 1. As the data is replicated thrice in HDFS, does it mean that any calculation done on one node will also be replicated on the other two? Since there are 3 nodes, when we send the MapReduce programs, calculations will […]

Hadoop Interview Questions and Answers Part – 5