Monthly Archives: April 2015

Hadoop Performance Tuning There are many ways to improve the performance of Hadoop jobs. In this post, we will provide¬†a few MapReduce properties that can be used at various mapreduce phases to improve the performance tuning. There is no one-size-fits-all technique for tuning Hadoop jobs, because of the architecture of […]

Hadoop Performance Tuning

Hadoop Best Practices Avoiding small files (sized less than 1 HDFS block, typically 128MB) with one map processing a single small file. Maintain Optimal HDFS Block size, generally >= 128 MB, to avoid tens of thousands of map tasks in processing large data sets. Usage of Combiners wherever applicable/suitable to […]

Hadoop Best Practices

In this post we will discuss about one of common/frequent error message faced by many oozie beginners due to incorrect/incomplete oozie setup.¬†Oozie Share Lib does not exist error is received if oozie sharelib is not installed properly. Error Scenario: File /user/user/share/lib does not exist This error message we will […]

Oozie Share Lib does not exist error