Java vs Hive 3

In this post we will discuss the differences between Java vs Hive with the help of word count example.

We will examine the Word Count Algorithm first using the Java MapReduce API and then using Hive.

The following Java implementation is included in the Apache Hadoop distribution.

For implementing the Word Count algorithm we need to write 63 lines of Java code and we need to compile it and build a Jar file with the classes. We have to execute this main class WordCount with below command

Java WC

And its output is as shown below.

Java WC out

But in hive the same word count algorithm can be implemented with just below 7 lines of code and neither compilation nor building jar files are needed.

In this example, we have written the above hive lines of code is written in wordcount.hql file and it is being executed through below command.

Hive WCIn the above screenshot, docs managed table is created and it is created under hdfs://localhost:9000/user/hive/warehouse/.

And two mapreduce jobs are submitted. Below are the output of word_counts managed table.

Hive WC out

The output files MR_WC/part-r-00000 and word_counts are showing the same results. So, with this example itself we can understand how much powerful hive is than java.

Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a Reply to Siva Cancel reply

Your email address will not be published. Required fields are marked *

3 thoughts on “Java vs Hive

  • Ritesh

    Dear Siva,

    Knowing java is mandatory to get into Hadoop? I am with .Net & SQL Server background and interested in learning Bigdata solutions. 

    Please suggest. Thank you.

  • Vinio

    Hi siva

    Thanks for your post. Hive might be simpler to implement than Java, but I have been in a situation (semi-structured data) where, using  java has significantly improved the overall performance of my ETL process. But My question is, in what case either  Hive or Java  is preferable?