Reading and Writing SequenceFile Example

This post is continuation for previous post on hadoop sequence files. In this post we will discuss about Reading and Writing SequenceFile Examples using Apache Hadoop 2 API.

Writing Sequence File Example:

As discussed in the previous post, we will use static method SequenceFile.createWriter(conf, opts) to create SequenceFile.Writer instance and we will use append(key, value) method to insert each record into sequencefile.

In the below example program, we are reading contents from a text file (syslog) on local file system and writing it to sequence file on hadoop. Here, we are using integer counter as key and each line from input file as value in sequence file format’s <key, value>.

For verification of (key, value) pairs in sequence file, we are printing first 50 records onto console. Copy below code snippet into program file.

Compile this program and build jar file (Say Seq.jar) and we will use this jar file to run SequenceFileWrite program on hadoop.

Run it with below command.

SequenceFile Write Example

Verify Output:

Verify the output sequence file /out/syslog.seq file with hadoop fs -cat command.  With this command we can see whether it is sequence file or not with first three bytes (SEQ) and we can know the writable classes of key and value and compression type and codec classes used in this sequence file.

From the below screen shot, we can understand that /out/syslog.seq is a sequence file as it has first three bytes as SEQ and

Key class is –
Value class is –
Compression Codec –

SequenceFile Format

Reading SequenceFile Example:

Now we will see how to read the above created sequence file through hadoop 2 API. We will create SequenceFile.Reader instance and use next(key, value) method to iterate over each record in the sequence file.

In the below program note that, we didn’t mention compression type or codec to the sequence file, that we used while creating it. By default reader instance will get these details from the file format itself and decompresses the file according to the codec found in the file format. Also note that, we have used getKeyClass() and getValueClass() methods on reader instance to retrieve the class names of (key,value) pairs in sequence file.

In the below program we are reading the contents of sequence file and printing them on console. Copy below code snippet into program file.

Compile this program and build jar file (Say Seq.jar) and we will use this jar file to run SequenceFileRead program on hadoop.

Run it with below command.

Below is the screenshot of first 10 lines of output from above command run.

SequenceFile Read Example

Reading SequenceFile with Command-line Interface:

There is an alternative way for viewing the contents of sequence file from command line interface. Hadoop provides command hadoop fs -text to display the contents of sequence file in text format.

This command looks at a file’s magic number to detect the type of the file and
appropriately convert it to text. It can recognize gzipped files and sequence files, otherwise, it assumes the input is plain text.  

hadoop fs text command

Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *