Miscellaneous


In Java development, a typical workflow involves restarting the server with every class change, and no one complains about it. That is a fact about Java development. We have worked like that since our first day with Java. But is Java class reloading that difficult to achieve? And could that problem […]

Advanced Java Class Tutorial: A Guide to Class Reloading


Cassandra Cassandra is a distributed database from Apache that is highly scalable and designed to manage very large amounts of structured data. It provides high availability with no single point of failure. NoSQL The primary objective of a NoSQL database is to have simplicity of design, horizontal scaling, and finer […]

Cassandra Interview Cheat Sheet


Cassandra Vnodes
Cassandra Overview Cassandra is another no-sql database. Similar to Hbase it is also distributed column-oriented database to handle big data workloads across multiple nodes but it can support both Local File system and HDFS, whereas in Hbase the underlying file system is also HDFS. It overcomes single point of failure […]

Cassandra Overview



This is quick touch on Impala commands and Functions. Impala accepts basic SQL syntax and below is the list of a few operators and commands that can be used inside Impala. This is just a quick cheat sheet. Databases In Impala, a database is a logical container for a group […]

Impala Commands Cheat Sheet


Impala Architecture 1
Cloudera provides a separate tool called Impala to overcome the slowness of Hive Queries. Syntactically Impala queries are more or less same as Hive Queries but they run very faster than Hive Queries. Impala provides high-performance, low-latency SQL queries. When we are dealing with medium sized data sets and we […]

Impala Introduction


3
HCatalog and Pig Integration In short, HCatalog opens up the hive metadata to other mapreduce tools. Every mapreduce tools has its own notion about HDFS data (example Pig sees the HDFS data as set of files, Hive sees it as tables). With having table based abstraction, HCatalog supported mapreduce tools […]

HCatalog and Pig Integration



Kafka Architecture
In this post we will discuss about Kafka Installation and Test Broker Setup in Ubuntu machine in a stand-alone zookeeper mode. Apache Kafka: Open Source Message Broker from Apache Software Foundation.  Initially developed at LinkedIn and later contributed to Open-source community. It is written in Scala. Kafka Provides a unified, […]

Kafka Installation and Test Broker Setup


Hadoop Testing Tools MRUnit  – Java framework that helps developers unit test Hadoop Map reduce jobs. Mockito –  Java Framework, similar to MRUnit for unit testing Hadoop Map reduce jobs. PigUnit – Java framework that helps developers unit test Pig Scripts. HiveRunner – An Open Source unit test framework for hadoop hivequeries based […]

Hadoop Testing Tools


1
Testing Flume Scope Testing will cover the functional testing of the data transfer from source machines (External Systems) to HDFS/HBase. Testing of Individual Flume Components like, different Source types, Channel Types and Sink Types will be included. Testing of Custom Flume Agents/Embedded flume agents in other automated jobs/tools. Limitations & […]

Flume Sqoop Pig HBase Unit Testing



In this post we will discuss about one of common/frequent error message faced by many oozie beginners due to incorrect/incomplete oozie setup. Oozie Share Lib does not exist error is received if oozie sharelib is not installed properly. Error Scenario: java.io.FileNotFoundException: File /user/user/share/lib does not exist This error message we will […]

Oozie Share Lib does not exist error


In this post we will discuss about common problem in installing cloudera manager 5.3.2 on Ubuntu 14.04 machine and solution for the root cause. Problem: When installing Cloudera Manager 5.3.2 on Ubuntu 14.04.2 Machine getting below error messages after giving root privileges on ssh configuration page. Exhausted available authentication methods […]

Exhausted available authentication methods


QuerySurgeConf6
QuerySurge Configuring Connections: SQL Server When you create a QuerySurge Connection, the Add Connection Wizard will guide you through the process. Different types of QuerySurge connections require different types of information. For an SQL Server Connection, you will need the following information (check with a DBA or other knowledgeable resource […]

QuerySurge Configuring Connections



QuerySurgeInstallation6
QuerySurge single machine installation – Windows 1. Download the QuerySurge Installer to the machine you want to install QuerySurge on. 2. Double click on the QuerySurge Installer to start the installation process.   Click “Next” to accept the License Agreement, and “Next” again to set the installation directory. 3. On […]

QuerySurge single machine installation – Windows


Querysurge Tool for Hadoop Testing The QuerySurge CASE tool developed by RTTS is a tool that assists the DW testers in preparing and scheduling query pairs to compare data transformed from the source to the destination, for example; preparing a query pair one that runs on a DS and the […]

Querysurge Tool for Hadoop Testing