In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can be done in any Scala or Java Spark application. The same dependencies for the Spark shell need to be registered in your build tool of choice (Maven, Gradle or sbt): groupId: za.co.absa.spline artifactId: spline-core version: 0.3.5 groupId: za.co.absa.spline artifactId: spline-persistence-mongo version:0.3.5 groupId: za.co.absa.spline artifactId:spline-core-spark-adapter-2.3 version:0.3.5 With reference to Scala and Spark 2.3.x, a Spark job like this: // Create the Spark session val sparkSession = SparkSession .builder() .appName("Spline Tester") .getOrCreate() // Init Spline System.setProperty("spline.persistence.factory", "za.co.absa.spline.persistence.mongo.MongoPersistenceFactory") System.setProperty("spline.mongodb.url", args(0)) System.setProperty("spline.mongodb.name", args(1)) imp...
Sharing thoughts and tips on Python, Java, Scala, Open Source, DevOps, Data Science, ML/DL/AI.