Monthly Archives: September 2014


Python Avro run
In the previous posts under Avro category we have examined Java API & Ruby API for avro serialization and deserialization. As part of Avro’s Interoperability testing, In this post we will examine Python API by creating a sample avro data file and reading the contents back. Prerequisite: Before running avro serialization […]

Avro Serialization & Deserialization – Python API


Ruby Output 1
In the previous posts under Avro category we have discussed about Java API for Avro serialization ruby example. As there is no need for the code generation for schema evolution, we can use any other language for interacting with avro serialization and deserialization. In this post, we will provide a […]

Avro Serialization Ruby Example


Deserialize data 2 2
In this post, we will discuss about an example of Avro Serializing and Deserializing with avro data file creation (serializing data) and deserializing the same avro data file to read the contents back. This is continuation for our previous post on Avro Schema , in which we have defined schema for […]

Avro Serializing and Deserializing Example – Java API



avro schema compilation 1
In this post we will discuss about the below aspects of avro schema. Avro Data Types Defining a schema Compiling the Schema and Code generation Avro Schemas are defined in JSON. Schemas are composed of primitive data types or complex data types. Primitive Types: Avro’s primitive types are listed below. […]

Avro Schema Example Definition


In our previous post, we have discussed about Hive CLI commands and now we will focus on continuation for the same topic with Hive Interactive Shell Commands and a few examples on these options. Hive Interactive Shell Commands By default Hive enters into Interactive shell mode, if we do not […]

Hive Interactive Shell Commands


In this post we will discuss about most frequent error messages NoClassDefFoundError or ClassNotFoundException in hadoop mapreduce jobs execution and possible solutions for them. Error Scenario: java.lang.ClassNotFoundException       or java.lang.NoClassDefFoundError Error starting MRAppMaster   &  Container exited with a non-zero exit code 1 When we encounter a situation of […]

NoClassDefFoundError or ClassNotFoundException



Multiple Outputs out2 1
Use Case Description: In this post we will discuss about the usage of Mapreduce Multiple Outputs Output format in Mapreduce jobs by taking one real world use case. In this, we are considering an use case to generate multiple output file names from reducer and these file names should be […]

MapReduce Multiple Outputs Use case


1
In this post we will discuss about some of the errors or exceptions that can occur when there is mismatch in the integration of Avro and Hadoop distributions. When we do not use the correct version of avro release then we will run into so many errors or exceptions. In […]

Hadoop Integration – Avro Errors


avro installation 1
In this post, we will discuss about basic introduction about Avro serialization. What is Avro Serialization? : Avro is a one of the famous data serialization and deserialization frameworks that greatly integrates with almost all hadoop platforms. Avro framework is created by Doug Cutting, the creator of Hadoop and now it […]

Avro Serialization



1
Error Scenario: Expected timestamp in the Flume event headers, but it was null – NullPointerException This error message is received in ~/logs/flume.log file when starting a flume agent with HDFS sink with format escape sequences.(%Y, %M, %D, %H, %M, %S). [crayon-5cbb44565666c766804212/] If a sink expects a header but does not […]

Expected timestamp in the Flume event headers, but it was ...


Error Scenario: channel has been removed due to an error during configuration. This error message is received in ~/logs/flume.log file when starting a flume agent with missing or wrong configuration with JDBC channel setup. [crayon-5cbb44565706a298040435/] Root Cause: DerbySchemaHandler.schemaExists error message is received when a flume agent’s sink is not able […]

channel has been removed due to an error


Error Scenario: Mapreduce job stuck at map 0% reduce 0%. : Could not resolve hostname [crayon-5cbb4456576b3599690559/] Mapreduce jobs are not running. And the below error message is coming when starting HDFS daemons. [crayon-5cbb4456576c5409685695/] Resolution: These above error messages are received when there is some issue with ssh communication. So, in these […]

Mapreduce job stuck at map 0% reduce 0%



Pig Example Run
In this post, we will describe the procedure for Pig Installation on Ubuntu Machine. Prerequisite: Below are the basic requirement for Pig installation on Ubuntu and getting started. Java 1.6 or Later versions installed and JAVA_HOME environment variable set to Java installation directory Hadoop1.x or 2.x Installed on the cluster. […]

Pig Installation on Ubuntu


HDFS out 2
In this post, we will provide proof of concept for Flume Data collection into HDFS with Avro Serialization by using HDFS sink, Avro Serializer on Sequence Files with Snappy Compression. Also we will use the formatting escape sequences to store the events on HDFS Path. In this post, we will create […]

Flume Data Collection into HDFS with Avro Serialization


CollectionsFramework
Java Collections Framework Notes: Collection Interface: The fundamental interface for collection classes in the Java library is the Collection interface. [crayon-5cbb445658ec4268317748/] The add method returns true if adding the element actually changes the collection, and false if the collection is unchanged. The iterator method returns an object that implements the […]

Short Notes on Java Collections Framework