Hadoop Common

Impala Conditions with Example Impala supports the following conditional functions for testing equality, comparison operators, and nullity: ‘Case’ Example: 1)  If else select case when 20 > 10 then 20 else 15 end; Output:  20 2) If else if select case when 9 > 10 then 20 when 1 > […]

Impala Miscellaneous Functions

Table of ContentsPMD (Programming Mistake Detector)What is PMD?How to install PMD?How to use PMD?Finding Cut and Paste Code(CPD):Working POM confiiguration PMD (Programming Mistake Detector) What is PMD? PMD aka Programming Mistake Detector is Java Source Code Analyzer. It is used to clean erroneous code in our java projects based on […]

PMD (Programming Mistake Detector)

 Installing the UDF Development Package [crayon-588289b8a9b1d395926994/] The output will be like below code. [cloudera@quickstart impala-udf-samples-master]$ cmake . — The C compiler identification is GNU 4.4.7 — The CXX compiler identification is GNU 4.4.7 — Check for working C compiler: /usr/bin/cc — Check for working C compiler: /usr/bin/cc — works — […]

Creating UDF and UDAF for Impala

Table of ContentsCREATEINSERTSELECTDELETEALTERUPDATELIKE CLAUSELIMIT CLAUSEORDER BYGROUP BYWITH CLAUSEHAVING CLAUSEDISTINCT CREATE [crayon-588289b8aa20f926251817/] We can see our new table by typing this: [crayon-588289b8aa21a230228012/] List of relations Schema |    Name    | Type  |  Owner ——–+————+——-+———- public | playground | table | postgres (1 row) INSERT [crayon-588289b8aa220770861768/]   Message returned if only one row […]

Postgres Commands

To install the server locally use the command line and type [crayon-588289b8aab40567887782/] To start off, we need to set the password of the PostgreSQL user (role) called “postgres”; we will not be able to access the server externally otherwise. As the local “postgres” Linux user, we are allowed to connect […]

Postgres Installation On Centos

Table of Contents­HBase & Solr – Near Real time indexing and searchCreating a Lily HBase Indexer configurationCreating a Morphline Configuration FileStarting & Registering a Lily HBase Indexer configuration with the Lily HBase Indexer ServiceVerifying the indexing is workingConfiguring Lily HBase NRT Indexer Service for Use with Cloudera SearchUsing the Lily […]

HBase & Solr Search Integration

Spark RDD
Table of ContentsWhat is an RDD?Why RDD in Spark?Data Sharing in MapReduce:Data Sharing in Spark :RDD Abstraction:How to program with RDD:Example :1 Creating an RDD of Strings with text file () in Python: Example :2 Calling the filter() transformationExample 3 : Calling first() actionExample 4: Persisting an RDD in memoryLazy Evaluation […]

Resilient Distributed Dataset

Below are Impala performance tuning options: Table of ContentsPre-execution ChecklistData Type ChoicesPartitioningUse Parquet Columnar Format for HDFSQuick Note on CompressionSnappyGzip/ZlibLeft-Deep Join TreeTypes of Hash JoinsBroadcastShuffleHow to use ANALYZEHinting JoinsDetermining Join Type From EXPLAINMemory Requirements for Joins & Aggregates Pre-execution Checklist    Data types    Partitioning    File Format Data Type […]

Impala Best Practices

Installing Apache Storm The prerequisite for storm to work on the machine. a. Download and installation commands for ZeroMQ 2.1.7: Run the following commands on terminals [crayon-588289b8ac552600269853/] b. Download and installation commands for JZMQ:  [crayon-588289b8ac55e529261493/]   2. Download latest storm from http://storm.apache.org/downloads.html  [crayon-588289b8ac564464694746/] Second start Storm Cluster by starting master […]

Apache Storm Integration With Apache Kafka

While developing Kafka, the main focus was to provide the following:   An API for producers and consumers to support custom implementation   Low overheads for network and storage with message persistence on disk   A high throughput supporting millions of messages for both publishing and subscribing—for example, real-time log […]

Kafka Design

Table of ContentsProduction issue: Few Production configurations in cassandraRetryPolicyDefaultRetryPolicyDowngradingConsistencyRetryPolicyReconnection PolicyConstantReconnectionPolicyExponentialReconnectionPolicy (default)Load Balancing PolicyRoundRobinPolicyDCAwareRoundRobinPolicyTokenAwarePolicy(default) Production issue: when we are trying to write a select query with 8 lacks ids “in condition “. then we got faced below issue,    To solve the above exception, we used distributed calls in Java client as shown […]

Cassandra production scenarios/issues

Table of ContentsCassandra Table structure/TerminologyCQL CommandsCreating a key-spaceUse the keyspace (will use that key space)Get list of key spacesCreate tableGet list of tables in a key-spaceInsert data into tableDescribe tableCreate indexUpdate data in tableDelete data in tableLimitations in CQLJava client exampleMiscellaneous commands in CQLGet size of table:Flush data into SStable/disk:Copy table content […]

Cassandra query language (CQL) and Cassandra Java Client Example

Table of ContentsStorage engineHow data is written?Compaction:Types of compaction:How is data updated?How is data deleted?How is data read?How do write patterns effect reads? Storage engine Cassandra uses a storage structure similar to a Log-Structured Merge Tree, unlike a typical relational database that uses a B-Tree. Cassandra avoids reading before writing. […]

Cassandra write and read process

Cassandra is designed in such a way that, there will not be any single point of failure. There is no master- slave architecture in cassandra. cassandra addresses the problem of SPOF by employing a peer-to-peer distributed system across homogeneous nodes where data is distributed among all nodes in the cluster. In cassandra all […]

Cassandra Architecture