how to work with large number of small files in hadoop


JSON View of avro file
This post is a continuation for previous post on working with small files issue. In previous we have merged huge number of small files on HDFS directory into sequencefile and in this post we will merge huge number of small files on local file system into avro file on HDFS […]

Merging Small Files Into Avro File


SequenceFile Key Extractor 7
In this post, we will discuss one of the famous use case of SequenceFiles, where we will merge large number of small files into SequenceFile. We will get to this requirement mainly due to the lack efficient processing of large number of small files in hadoop or mapreduce. Need For […]

Merging Small Files into SequenceFile