Pig


3
HCatalog and Pig Integration In short, HCatalog opens up the hive metadata to other mapreduce tools. Every mapreduce tools has its own notion about HDFS data (example Pig sees the HDFS data as set of files, Hive sees it as tables). With having table based abstraction, HCatalog supported mapreduce tools […]

HCatalog and Pig Integration


Below is one of the good collection of examples for most frequently used functions in Pig. Pig Functions Examples. Contents LOAD DESCRIEBE/EXPLAIN/ILLUSTRATE FOREACH GROUP STORE LIMIT ORDER DISTINCT JOIN JOIN USING MULTIPLE KEYS OUTER JOINS SELF JOIN COUNT NUMBER OF ROWS IN SELF JOIN’S OUTPUT SAMPLE PARALLEL UDF:REGISTER UDF:DEFINE CALLING […]

Pig Functions Examples


2
Below is the Pig Functions Cheat Sheet prepared by collecting different types of functions. Pig Execution Modes Grunt mode: It is interactive mode of pig. Very useful for testing syntax checking and ad-hoc data exploration Script mode: Run set of instructions from a file Similar to sql script file Embedded […]

Pig Functions Cheat Sheet



hadoop log process output 3
In the previous post we have discussed about the basic introduction on log files and the architecture of log analysis in hadoop. In this post, we will enter into much deeper details on processing logs in pig. As discussed in the previous post, there will be three types of log […]

Processing Logs in Pig


JsonLoader example 2
In this post, we will discuss about the following built in load store functions in pig with examples. PigStorage TextLoader BinStorage JsonLoader, JsonStorage AvroStorage HBaseStorage MongoStorage PigStorage: PigStorage() is the default load/store function in pig. PigStorage expects data to be formatted using field delimiters and the default delimiter is ‘\t’. PigStorage() itself […]

Built-in Load Store Functions in Pig


In this post, we will discuss about basic details of load functions in pig with some sample examples and we will also discuss about custom load functions in pig by writing UDFs. To work with data in Pig, the first thing we need to do is load data from a […]

Load Functions In Pig



Log Analysis in Hadoop 3
In this post we will discuss about various log file types and Log Analysis in Hadoop. Log Files: Logs are computer-generated files that capture network and server operations data.They are useful  during various stages of software development, mainly for debugging and profiling purposes and also  for managing network operations. Need […]

Log Analysis in Hadoop


Pig Example Run
In this post, we will describe the procedure for Pig Installation on Ubuntu Machine. Prerequisite: Below are the basic requirement for Pig installation on Ubuntu and getting started. Java 1.6 or Later versions installed and JAVA_HOME environment variable set to Java installation directory Hadoop1.x or 2.x Installed on the cluster. […]

Pig Installation on Ubuntu


Pig Architecture 1
In this post we will discuss about the basic details/introduction about Apache Pig. What is Apache Pig? Pig is a scripting language for exploring huge data sets of size gigabytes or terabytes very easily. Pig provides an engine for executing data flows in parallel on Hadoop. Pig is made up of […]

Apache Pig Overview