Monthly Archives: November 2014



Bar charts in qlikview 1
In this post we will discuss about basic introduction to Qlikview BI tool and Qlikview Integration with hadoop hive. In this post we will use Cloudera Hive and its jdbc drivers/connectors to connect with Qlikview and we will see sample table retrieval from cloudera hadoop hive database. QlikView Overview What […]

QlikView Integration with Hadoop


In this post, we will discuss about the details on communication between two nodes in a network via SSH and executing/running remote commands over SSH on a remote machine. These two nodes in the cluster can be treated as server/client machines for easy understanding. To allow secure communications between Server […]

Run Remote Commands over SSH



This post provides a very brief notes on Unix Shell Scripting. As this topic is very well described in many text books,we are not going much deep into the details of each point. This post is for quick review/revision/reference of common Unix commands or Unix Shell Scripting. Unix Shell Scripting […]

Brief Notes on Unix Shell Scripting Concepts


hadoop log process output 3
In the previous post we have discussed about the basic introduction on log files and the architecture of log analysis in hadoop. In this post, we will enter into much deeper details on processing logs in pig. As discussed in the previous post, there will be three types of log […]

Processing Logs in Pig


JsonLoader example 2
In this post, we will discuss about the following built in load store functions in pig with examples. PigStorage TextLoader BinStorage JsonLoader, JsonStorage AvroStorage HBaseStorage MongoStorage PigStorage: PigStorage() is the default load/store function in pig. PigStorage expects data to be formatted using field delimiters and the default delimiter is ‘\t’. PigStorage() itself […]

Built-in Load Store Functions in Pig



In this post, we will discuss about basic details of load functions in pig with some sample examples and we will also discuss about custom load functions in pig by writing UDFs. To work with data in Pig, the first thing we need to do is load data from a […]

Load Functions In Pig


Log Analysis in Hadoop 3
In this post we will discuss about various log file types and Log Analysis in Hadoop. Log Files: Logs are computer-generated files that capture network and server operations data.They are useful  during various stages of software development, mainly for debugging and profiling purposes and also  for managing network operations. Need […]

Log Analysis in Hadoop


Solr Admin 3
In this post we will discuss about the basic introduction of Apache Solr and we will also describe the procedure for Apache Solr Installation on Ubuntu Machine. Apache Solr Overview: What is Apache Solr? Apache Solr is another top level project from Apache Software Foundation, it is an open source […]

Apache Solr Installation on Ubuntu



multi agent 2
In this post we will discuss about setting up of Multiple Agents in a flume flow and pass events from one machine to another machine. We will pass events from one machine to another machine via Avro RPC protocol. Multi Agent Setup in Flume: In Multi Agent or Multi-hop setup […]

Multi Agent Setup in Flume


Dashboard 6
In this post we are going to discuss about basic details of Tableau software and Tableau Integration with hadoop. Tableau Overview What is Tableau? Tableau is a visualization tool based on breakthrough technology  that provides drag & drop features to analyze data on large amounts of data very easily and quickly. The […]

Tableau Integration with Hadoop


Test2 completion 7
In this post, we will discuss about basic details of Azkaban hadoop and its setup in Ubuntu machine. What is Azkaban Hadoop? Azkaban Hadoop is an open-source workflow engine for hadoop eco system. It is a batch job scheduler allowing developers to control job execution inside Java and especially Hadoop projects. […]

Azkaban Hadoop – A Workflow Scheduler For Hadoop



5
Below are some the of important hive Interview Questions and Answers for experienced hadoop developers. Hive Interview Questions and Answers for experienced 1. What is the Hive configuration precedence order? There is a precedence hierarchy to setting properties. In the following list, lower numbers take precedence over higher numbers: The […]

Hive Interview Questions and Answers for experienced Part – 4


Gzipped Compress data
In this post, we will create a new database in MySQL server and create table and insert data into it and we will do importing MySQL data into HDFS via Sqoop tool. We assume that MySQL is installed and Sqoop & Hadoop are installed on local machine to test this example. We […]

Sqoop Importing MySQL Data into HDFS


Sqoop Installation on Ubuntu 3
In this post we will discuss about the basic introduction to Sqoop and Sqoop Installation on Ubuntu machine and we will discuss about example run of Sqoop from MySQL database in the next post. SQOOP Introduction: What is Sqoop? Sqoop is open source tool that enables users to transfer bulk data […]

SQOOP Installation on Ubuntu