Skip to main content

Posts

Showing posts from June, 2017

Unit testing Spark applications in Scala (Part 2): Intro to spark-testing-base

In the first part of this series we became familiar with ScalaTest . When it comes to unit test Scala Spark applications ScalaTest isn't enough: you need to add to the roster spark-testing-base . It is an Open Source framework which provides base classes for the main Spark abstractions like SparkContext, RDD, DataFrame, DataSet and Streaming. Let's start to explore all of the facilities provided by this framework and how it works along with ScalaTest with some simple examples. Let's consider the following Scala word count example found on the web: import org.apache.spark.{SparkConf, SparkContext} object SparkWordCount {    def main(args: Array[String]) {     val inputFile = args(0)     val outputFile = args(1)     val conf = new SparkConf().setAppName(" SparkWordCount ")     // Create a Scala Spark Context.     val sc = new SparkContext(conf)     // Load our input data.     val input = sc.textFile(inputFile)     // Split up into words.     val wo

Hubot & SDC

My first Open Source Hubot script has been released and is available in my GitHub space . It provides support to check the status of pipelines in a Streamsets Data Collector server. It is still in alpha release, but the development is ongoing, so new features and improvements will be constantly implemented. Enjoy it!