Avro



JSON View of avro file
This post is a continuation for previous post on working with small files issue. In previous we have merged huge number of small files on HDFS directory into sequencefile and in this post we will merge huge number of small files on local file system into avro file on HDFS […]

Merging Small Files Into Avro File


Output in JSON format
Avro provides support for both old Mapreduce Package API (org.apache.hadoop.mapred) and new Mapreduce Package API (org.apache.hadoop.mapreduce). Avro data can be used as both input and output from a MapReduce job, as well as the intermediate format. In this post we will provide an example run of Avro Mapreduce 2 API. This […]

Avro MapReduce 2 API Example



avro mr job output 1
In this post, we will discuss about famous word count example through mapreduce and create a sample avro data file in hadoop distributed file system. Prerequisite: In order to execute the mapreduce word count program given in this post, we need avro-mapred-1.7.4-hadoop2.jar file to be present in $HADOOP_HOME/share/hadoop/common/lib directory. This […]

Avro MapReduce Word Count Example


Python Avro run
In the previous posts under Avro category we have examined Java API & Ruby API for avro serialization and deserialization. As part of Avro’s Interoperability testing, In this post we will examine Python API by creating a sample avro data file and reading the contents back. Prerequisite: Before running avro serialization […]

Avro Serialization & Deserialization – Python API


Ruby Output 1
In the previous posts under Avro category we have discussed about Java API for Avro serialization ruby example. As there is no need for the code generation for schema evolution, we can use any other language for interacting with avro serialization and deserialization. In this post, we will provide a […]

Avro Serialization Ruby Example



Deserialize data 2 2
In this post, we will discuss about an example of Avro Serializing and Deserializing with avro data file creation (serializing data) and deserializing the same avro data file to read the contents back. This is continuation for our previous post on Avro Schema , in which we have defined schema for […]

Avro Serializing and Deserializing Example – Java API


avro schema compilation 1
In this post we will discuss about the below aspects of avro schema. Avro Data Types Defining a schema Compiling the Schema and Code generation Avro Schemas are defined in JSON. Schemas are composed of primitive data types or complex data types. Primitive Types: Avro’s primitive types are listed below. […]

Avro Schema Example Definition


1
In this post we will discuss about some of the errors or exceptions that can occur when there is mismatch in the integration of Avro and Hadoop distributions. When we do not use the correct version of avro release then we will run into so many errors or exceptions. In […]

Hadoop Integration – Avro Errors



avro installation 1
In this post, we will discuss about basic introduction about Avro serialization. What is Avro Serialization? : Avro is a one of the famous data serialization and deserialization frameworks that greatly integrates with almost all hadoop platforms. Avro framework is created by Doug Cutting, the creator of Hadoop and now it […]

Avro Serialization