Miscellaneous


Impala Conditions with Example Impala supports the following conditional functions for testing equality, comparison operators, and nullity: ‘Case’ Example: 1)  If else select case when 20 > 10 then 20 else 15 end; Output:  20 2) If else if select case when 9 > 10 then 20 when 1 > […]

Impala Miscellaneous Functions


PMD5
PMD (Programming Mistake Detector) What is PMD? PMD aka Programming Mistake Detector is Java Source Code Analyzer. It is used to clean erroneous code in our java projects based on predefined set of rules. PMD supports the ability to write custom rules. Issues reported by PMD may not be true […]

PMD (Programming Mistake Detector)


 Installing the UDF Development Package [crayon-5999742007b88184372086/] The output will be like below code. [cloudera@quickstart impala-udf-samples-master]$ cmake . — The C compiler identification is GNU 4.4.7 — The CXX compiler identification is GNU 4.4.7 — Check for working C compiler: /usr/bin/cc — Check for working C compiler: /usr/bin/cc — works — […]

Creating UDF and UDAF for Impala



CREATE [crayon-5999742007fb9655842257/] We can see our new table by typing this: [crayon-5999742007fc1576578739/] List of relations Schema |    Name    | Type  |  Owner ——–+————+——-+———- public | playground | table | postgres (1 row) INSERT [crayon-5999742007fc5844723533/]   Message returned if only one row was inserted. oid is the numeric OID of the […]

Postgres Commands


impala-explain
Below are Impala performance tuning options: Pre-execution Checklist    Data types    Partitioning    File Format Data Type Choices      Define integer columns as INT/BIGINT      Operations on INT/BIGINT more efficient than STRING      Convert “external” data to good “internal” types on load      e.g. CAST […]

Impala Best Practices


kafka+storm1
Installing Apache Storm The prerequisite for storm to work on the machine. a. Download and installation commands for ZeroMQ 2.1.7: Run the following commands on terminals [crayon-599974200885e075141448/] b. Download and installation commands for JZMQ:  [crayon-5999742008866732486260/]   2. Download latest storm from http://storm.apache.org/downloads.html  [crayon-599974200886a101012696/] Second start Storm Cluster by starting master […]

Apache Storm Integration With Apache Kafka



kafka_arch
While developing Kafka, the main focus was to provide the following:   An API for producers and consumers to support custom implementation   Low overheads for network and storage with message persistence on disk   A high throughput supporting millions of messages for both publishing and subscribing—for example, real-time log […]

Kafka Design



Production issue: when we are trying to write a select query with 8 lacks ids “in condition “. then we got faced below issue,    To solve the above exception, we used distributed calls in Java client as shown below, [crayon-599974200934a525935352/] Few Production configurations in cassandra RetryPolicy Three scenarios you can control […]

Cassandra production scenarios/issues



analogy
Cassandra Table structure/Terminology Before going to learn CQL commands, we just need to know terminology in cassandra. RDBMS Cassandra Terminology Database Keyspace Table Column Family Primary key Row Key Column name Column name Column value column value CQL Commands Creating a key-space [crayon-59997420096e8141384437/] Use the keyspace (will use that key space) [crayon-59997420096f0397622655/] […]

Cassandra query language (CQL) and Cassandra Java Client Example


dml_caching-reads_12
Storage engine Cassandra uses a storage structure similar to a Log-Structured Merge Tree, unlike a typical relational database that uses a B-Tree. Cassandra avoids reading before writing. Read-before-write, especially in a large distributed system, can produce stall in read performance and other problems. Cassandra never re-writes or re-reads existing data, […]

Cassandra write and read process


arc_vnodes_compare
Cassandra is designed in such a way that, there will not be any single point of failure. There is no master- slave architecture in cassandra. cassandra addresses the problem of SPOF by employing a peer-to-peer distributed system across homogeneous nodes where data is distributed among all nodes in the cluster. In cassandra all […]

Cassandra Architecture




This post is about some notes on Zookeeper commands and scripts. This is mainly useful for Hadoop Admins and all commands are self explanotry. ZooKeeper is a distributed centralized co-ordination service Zookeeper addresses issues with distributed applications: Maintain configuration information (share config info across all nodes) Naming Service(allows one node […]

Zookeeper Commands