Log Analysis in Hadoop 3
In this post we will discuss about various log file types and Log Analysis in Hadoop. Log Files: Logs are computer-generated files that capture network and server operations data.They are useful  during various stages of software development, mainly for debugging and profiling purposes and also  for managing network operations. Need […]

Log Analysis in Hadoop

multi agent 2
In this post we will discuss about setting up of Multiple Agents in a flume flow and pass events from one machine to another machine. We will pass events from one machine to another machine via Avro RPC protocol. Multi Agent Setup in Flume: In Multi Agent or Multi-hop setup […]

Multi Agent Setup in Flume

HDFS out 2
In this post, we will provide proof of concept for Flume Data collection into HDFS with Avro Serialization by using HDFS sink, Avro Serializer on Sequence Files with Snappy Compression. Also we will use the formatting escape sequences to store the events on HDFS Path. In this post, we will create […]

Flume Data Collection into HDFS with Avro Serialization

Table increments 6
This post provides a proof of concept of data collection from HTTP client into HBase. In this post, we will setup a flume agent with HTTP Source, JDBC Channel and AsyncHBase Sink. Initially we concentrate on POC of HTTP client data collection into HBase and at the end of this […]

Data Collection from HTTP Client into HBase

Hbase sink op 5
We will discuss about collection of data into HBase directly through flume agent. In our previous posts under flume category, we have covered setup of flume agents for file roll, logger and HDFS sink types. In this, we are going to explore the details of HBase sink and its setup […]

Flume Data Collection into HBase

Flume agent Op 2
In this post, we will discuss about setup of an agent for Flume data collection into HDFS . In this post, we will setup an agent with Sequence Generator Source, HDFS Sink and Memory channel and start that agent and verify its functionality. Flume data collection into HDFS Flume Agent – Sequence Generator […]

Flume Data Collection into HDFS

Flume Avro Client Op 2
In this post, we will discuss about setup of a Flume Agent using Avro Client, Avro Source, JDBC Channel, and File Roll sink. First we will create Agent3 in flume.conf file under FLUME_HOME/conf directory. Flume Agent – Avro Source, JDBC Channel and File Roll Sink: Add the below configuration properties in flume.conf […]

Flume Avro Client – Collecting a Remote File into Local ...

Flume channel op 1
In this post, we will discuss about flume agent configuration and setup for collecting data from an output of a command line tool into a flat file. We will use Exec Source type, File Channel and File Roll sink type in configuration of our agent. Lets name our agent as […]

Flume Agent – Collect Data From Command to a Flat ...

flume.log 2
As discussed in previous post, we will discuss in detail about the properties in flume agent configuration properties. For ease of understanding, we will consider the same flume.conf file created in our previous post. Flume agent configuration file flume.conf resembles a Java property file format with hierarchical property settings. Here the […]

Flume Agent Configuration

Stop Agent 3
In this post, we will discuss about setting up of simple flume agent using Netcat as source and Console as sink. In this example of single-node Flume deployment, we create a Netcat source which listens on a port (localhost:44444) for network connections and logger sink type to log network traffic to console. […]

Flume Agent Setup – Netcat Source, Console Sink

Flume help 8
In this post, we briefly discuss about Apache Flume Installation and Configuration on Ubuntu machine. The current version of Apache Flume is called as Flume NG (Next Generation) and it’s old version is renamed as Flume OG (Old Generation). In this post, we will discuss about Flume NG only. Prerequisite:  JDK 1.6 […]

Apache Flume Installation

Log Analysis 6
This post describes basics of Apache Flume overview and illustrates its architecture. What is Flume ? : Flume is a highly reliable, distributed and configurable streaming data collection tool. Flume can transport log files across a large number of hosts into HDFS. Need for Flume: These days, most of the new data is contained […]

Flume Architecture