Googlielmo's blog

Posts

Showing posts from December, 2018

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 2)

In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can be done in any Scala or Java Spark application. The same dependencies for the Spark shell need to be registered in your build tool of choice (Maven, Gradle or sbt): groupId: za.co.absa.spline artifactId: spline-core version: 0.3.5 groupId: za.co.absa.spline artifactId: spline-persistence-mongo version:0.3.5 groupId: za.co.absa.spline artifactId:spline-core-spark-adapter-2.3 version:0.3.5 With reference to Scala and Spark 2.3.x, a Spark job like this: // Create the Spark session val sparkSession = SparkSession .builder() .appName("Spline Tester") .getOrCreate() // Init Spline System.setProperty("spline.persistence.factory", "za.co.absa.spline.persistence.mongo.MongoPersistenceFactory") System.setProperty("spline.mongodb.url", args(0)) System.setProperty("spline.mongodb.name", args(1)) imp...