Monthly Archives: August 2014

Hbase sink op 5
We will discuss about collection of data into HBase directly through flume agent. In our previous posts under flume category, we have covered setup of flume agents for file roll, logger and HDFS sink types. In this, we are going to explore the details of HBase sink and its setup […]

Flume Data Collection into HBase

Flume agent Op 2
In this post, we will discuss about setup of an agent for Flume data collection into HDFS . In this post, we will setup an agent with Sequence Generator Source, HDFS Sink and Memory channel and start that agent and verify its functionality. Flume data collection into HDFS Flume Agent – Sequence Generator […]

Flume Data Collection into HDFS

Flume Avro Client Op 2
In this post, we will discuss about setup of a Flume Agent using Avro Client, Avro Source, JDBC Channel, and File Roll sink. First we will create Agent3 in flume.conf file under FLUME_HOME/conf directory. Flume Agent – Avro Source, JDBC Channel and File Roll Sink: Add the below configuration properties in flume.conf […]

Flume Avro Client – Collecting a Remote File into Local ...

Hive Query Execution from HDFS file
In our previous posts, we have seen about Hive Overview and Hive Architecture and now we will discuss about the default service in hive, Hive Command Line Interface and Hive CLI Commands. Ways to Interact with Hive CLI, command-line interface . Karmasphere ( ) (commercial product), Cloudera’s open source Hue (https://git […]

Hive CLI Commands

Error Scenario: org.apache.flume.EventDeliveryException: Failed to open file  We will receive this error message when a flume agent is started and it is trying to start FILE_ROLL sink with a given target sink directory. Below are the error messages sequence from ~/logs/flume.log file. [crayon-5df15612ea971127099795/] Root Cause: When a flume agent is […]

org.apache.flume.EventDeliveryException: Failed to open file

In this post, we will discuss about hive Interview Questions and Answers for experienced and freshers. Hive Interview Questions and Answers for experienced: 1. How to start Hive metastore service as a background process? We can start hive metastore service as a background process with below command. [crayon-5df15612ec5da434393575/] By using kill -9 […]

Hive Interview Questions and Answers for experienced – Part 2

Flume channel op 1
In this post, we will discuss about flume agent configuration and setup for collecting data from an output of a command line tool into a flat file. We will use Exec Source type, File Channel and File Roll sink type in configuration of our agent. Lets name our agent as […]

Flume Agent – Collect Data From Command to a Flat ...

flume.log 2
As discussed in previous post, we will discuss in detail about the properties in flume agent configuration properties. For ease of understanding, we will consider the same flume.conf file created in our previous post. Flume agent configuration file flume.conf resembles a Java property file format with hierarchical property settings. Here the […]

Flume Agent Configuration

Stop Agent 3
In this post, we will discuss about setting up of simple flume agent using Netcat as source and Console as sink. In this example of single-node Flume deployment, we create a Netcat source which listens on a port (localhost:44444) for network connections and logger sink type to log network traffic to console. […]

Flume Agent Setup – Netcat Source, Console Sink

Below are some of the important Hive Interview Questions and Answers required for Hadoop developers and administrators. Hive Interview Questions and Answers 1. What is Metadata? Data about Data. 2. What is Hive? Hive is one of the important tool in Hadoop eco system and it provides an SQL like dialect […]

Hive Interview Questions and Answers – Part 1

Flume help 8
In this post, we briefly discuss about Apache Flume Installation and Configuration on Ubuntu machine. The current version of Apache Flume is called as Flume NG (Next Generation) and it’s old version is renamed as Flume OG (Old Generation). In this post, we will discuss about Flume NG only. Prerequisite:  JDK 1.6 […]

Apache Flume Installation

Log Analysis 6
This post describes basics of Apache Flume overview and illustrates its architecture. What is Flume ? : Flume is a highly reliable, distributed and configurable streaming data collection tool. Flume can transport log files across a large number of hosts into HDFS. Need for Flume: These days, most of the new data is contained […]

Flume Architecture

Hive WC out 3
In this post we will discuss the differences between Java vs Hive with the help of word count example. We will examine the Word Count Algorithm first using the Java MapReduce API and then using Hive. The following Java implementation is included in the Apache Hadoop distribution. [crayon-5df1561300fc3124511462/] For implementing the Word […]

Java vs Hive

In this post we will discuss about the differences between Hive vs RDBMS (traditional relation databases). Few examples of traditional relational databases are MySQL, PostgreSQL, Oracle 11g, MS SQL Server etc. Below are the key features of Hive that differ from RDBMS. Hive resembles a traditional database by supporting SQL interface […]

Hive vs RDBMS