Skip to main content

Posts

Showing posts from November, 2018

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 1)

One interesting and promising Open Source project that caught my attention lately is Spline , a data lineage tracking and visualization tool for Apache Spark , maintained at  Absa . This project consists of 2 parts: a Scala library that works on the drivers which, by analyzing the Spark execution plans, captures the data lineages and a web application which provides a UI to visualize them. Spline supports MongoDB and HDFS as storage systems for the data lineages in JSON format. In this post I am referring to MongoDB. You can start playing with Spline through the Spark shell. Just add the required dependencies to the shell classpath as follows (with reference to the latest 0.3.5 release of this project): spark-shell --packages "za.co.absa.spline:spline-core:0.3.5,za.co.absa.spline:spline-persistence-mongo:0.3.5,za.co.absa.spline:spline-core-spark-adapter-2.3:0.3.5" Running the Spark shell with the command above on Ubuntu and some other Linux distro, whether some issue on

Black Friday @Packt Publishing!

This Friday November 23rd 2018 would be Black Friday at Packt Publishing too! Each book or video, including the latest releases, could be purchased for US$ 10 only. It would be also possible to pre-order my upcoming book " Hands-on Deep Learning with Apache Spark " for US$ 10. Please remember that this convenient price is valid on Friday 23rd only. Enjoy it!