Yearly Archives: 2014


hadoop log process output 3
In the previous post we have discussed about the basic introduction on log files and the architecture of log analysis in hadoop. In this post, we will enter into much deeper details on processing logs in pig. As discussed in the previous post, there will be three types of log […]

Processing Logs in Pig


JsonLoader example 2
In this post, we will discuss about the following built in load store functions in pig with examples. PigStorage TextLoader BinStorage JsonLoader, JsonStorage AvroStorage HBaseStorage MongoStorage PigStorage: PigStorage() is the default load/store function in pig. PigStorage expects data to be formatted using field delimiters and the default delimiter is ‘\t’. PigStorage() itself […]

Built-in Load Store Functions in Pig


In this post, we will discuss about basic details of load functions in pig with some sample examples and we will also discuss about custom load functions in pig by writing UDFs. To work with data in Pig, the first thing we need to do is load data from a […]

Load Functions In Pig



Log Analysis in Hadoop 3
In this post we will discuss about various log file types and Log Analysis in Hadoop. Log Files: Logs are computer-generated files that capture network and server operations data.They are useful  during various stages of software development, mainly for debugging and profiling purposes and also  for managing network operations. Need […]

Log Analysis in Hadoop


Solr Admin 3
In this post we will discuss about the basic introduction of Apache Solr and we will also describe the procedure for Apache Solr Installation on Ubuntu Machine. Apache Solr Overview: What is Apache Solr? Apache Solr is another top level project from Apache Software Foundation, it is an open source […]

Apache Solr Installation on Ubuntu


multi agent 2
In this post we will discuss about setting up of Multiple Agents in a flume flow and pass events from one machine to another machine. We will pass events from one machine to another machine via Avro RPC protocol. Multi Agent Setup in Flume: In Multi Agent or Multi-hop setup […]

Multi Agent Setup in Flume



Dashboard 6
In this post we are going to discuss about basic details of Tableau software and Tableau Integration with hadoop. Tableau Overview What is Tableau? Tableau is a visualization tool based on breakthrough technology  that provides drag & drop features to analyze data on large amounts of data very easily and quickly. The […]

Tableau Integration with Hadoop


Test2 completion 7
In this post, we will discuss about basic details of Azkaban hadoop and its setup in Ubuntu machine. What is Azkaban Hadoop? Azkaban Hadoop is an open-source workflow engine for hadoop eco system. It is a batch job scheduler allowing developers to control job execution inside Java and especially Hadoop projects. […]

Azkaban Hadoop – A Workflow Scheduler For Hadoop


5
Below are some the of important hive Interview Questions and Answers for experienced hadoop developers. Hive Interview Questions and Answers for experienced 1. What is the Hive configuration precedence order? There is a precedence hierarchy to setting properties. In the following list, lower numbers take precedence over higher numbers: The […]

Hive Interview Questions and Answers for experienced Part – 4



Gzipped Compress data
In this post, we will create a new database in MySQL server and create table and insert data into it and we will do importing MySQL data into HDFS via Sqoop tool. We assume that MySQL is installed and Sqoop & Hadoop are installed on local machine to test this example. We […]

Sqoop Importing MySQL Data into HDFS


Sqoop Installation on Ubuntu 3
In this post we will discuss about the basic introduction to Sqoop and Sqoop Installation on Ubuntu machine and we will discuss about example run of Sqoop from MySQL database in the next post. SQOOP Introduction: What is Sqoop? Sqoop is open source tool that enables users to transfer bulk data […]

SQOOP Installation on Ubuntu


2
Below are a few more Pig Interview Questions and Answers 1. What is a tuple? A tuple is an ordered set of fields and A field is a piece of data. 2. What is a relation in Pig? A Pig relation is a bag of tuples. A Pig relation is similar […]

Pig Interview Questions and Answers Part – 2



Below are some of the Hadoop Pig Interview questions and answers that suitable for both freshers and experienced hadoop programmers. 1. What is Apache Pig? Pig is a scripting language for exploring huge data sets of size gigabytes or terabytes very easily. Pig provides an engine for executing data flows in […]

Hadoop Pig Interview Questions and Answers Part – 1


Hive UDF usage 5
In this post we will describe about the process of creating custom UDF in Hive. Though there are many generic UDFs (User defined functions)  provided by Hive we might need to write our custom UDFs sometime to meet our requirements. In this post, we will discuss about one of the […]

Creating Custom UDF in Hive – Auto Increment Column in ...


Healthy Cluster 20
In this post, we will discuss about hadoop installation on cloud storage. Though there are number of posts available across internet on this topic, we are documenting the procedure for Cloudera Manager Installation on Amazon EC2 instances with some of our practical views on installation and tips and hints to […]

Cloudera Manager Installation on Amazon EC2