Spark Streaming brings Apache Spark's language integrated APIs to write streaming jobs the same way as for writing batch jobs. It allows to build fault tolerant applications and reuse the same code for batch and interactive queries. Kafka is an Open Source message broker written in Scala . It is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant and wicked fast. Cassandra is an Open Source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. This post walks through the basics of the implementation of a simple streaming application integrating those three technologies. The code example is written in Scala. The releases I am referring to in this post are the following: Scala 2.11.8 Spark 1.6.2 Kafka Client APIs 0.8.2.11 Cassandra 3.9 Datastax Spark-Cassandra Connecto...
Sharing thoughts and tips on Python, Java, Scala, Open Source, DevOps, Data Science, ML/DL/AI.