Skip to main content

Posts

Showing posts from October, 2016

Integrating Kakfa, Spark Streaming and Cassandra: the basics

Spark Streaming brings Apache Spark's language integrated APIs to write streaming jobs the same way as for writing batch jobs. It allows to build fault tolerant applications and reuse the same code for batch and interactive queries. Kafka is an Open Source message broker written in Scala . It is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant and wicked fast. Cassandra is an Open Source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. This post walks through the basics of the implementation of a simple streaming application integrating those three technologies. The code example is written in Scala. The releases I am referring to in this post are the following:  Scala 2.11.8  Spark 1.6.2  Kafka Client APIs 0.8.2.11  Cassandra 3.9  Datastax Spark-Cassandra Connector compatible with Spark 1