Googlielmo's blog

Posts

Showing posts from November, 2017

Quick start with Apache Livy (part 2): the REST APIs

The second post of this series focuses on how to run a Livy server instance and start playing with its REST APIs. The steps below are meant for a Linux environment (any distribution). Prerequisites The prerequisites to start a Livy server are the following: The JAVA_HOME env variable set to a JDK/JRE 8 installation. A running Spark cluster. Starting the Livy server Download the latest version ( 0.4.0-incubating at the time this post is written) from the official website and extract the archive content (it is a ZIP file). Then setup the SPARK_HOME env variable to the Spark location in the server (for simplicity in this post I am assuming that the cluster is in the same machine as for the Livy server, but in the next post I will go through the customization of the configuration files, including the connection to a remote Spark cluster, wherever it is). By default Livy writes its logs into the $LIVY_HOME/logs location: you need to manually create this directory. Finally ...

Quick start with Apache Livy (part 1)

I have started doing evaluation of Livy for potential case scenarios where this technology could help and I'd like to share some findings with others who would like to approach this interesting Open Source project. It has been started by Cloudera and Microsoft and it is currently in the process of being incubated by the Apache Software Foundation. The official documentation isn't comprehensive at the moment, so I hope my posts on this topic could help someone else. Apache Livy is a service to interact with Apache Spark through a REST interface. It enables both submissions of Spark jobs or snippets of Spark code. The following features are supported: The jobs can be submitted as pre-compiled jars, snippets of code or via Java/Scala client API. Interactive Scala, Python, and R shells. Support for Spark 2.x and Spark1.x, Scala 2.10 and 2.11. It doesn't require any change to Spark code. It allows long running Spark Contexts that can be used for multiple Spark jobs, by...

How to Get Metrics From a Java Application Inside a Docker Container Using Telegraf

My latest article on DZone is online. There you can learn how to configure Telegraf to pull metrics from a Java application running inside a Docker container. The Telegraf Jolokia plugin configuration presented as example in the article is set up to collect metrics about the heap memory usage, thread count and class count, but these aren't the only metrics you can collect this way. When running a container hosting the Java app with the Jolokia agent, you can get the full list of available metrics through the following GET request: curl -X GET http://<jolokia_host>:<jolokia_port>/jolokia/list and pick up their names and attributes to be added to the plugin configuration.

Setting up a quick dev environment for Kafka, CSR and SDC

Few days ago Pat Patterson published an excellent article on DZone about Evolving Avro Schemas With Apache Kafka and StreamSets Data Collector. I recommend reading this interesting article. I followed this tutorial and today I want to share the details on how I quickly setup the environment for this purpose, just in case you should be interested on doing the same. I did it on a Linux Red Hat Server 7 (but the steps are the same for any other Linux distro) and using only images available in the Docker Hub . First start a Zookeeper node (which is required by Kafka ): sudo docker run --name some-zookeeper --restart always -d zookeeper and then a Kafka broker, linking the container to that for Zookeeper: sudo docker run -d --name kafka --link zookeeper:zookeeper ches/kafka Then start the Confluent Schema Registry (linking it to Zookeeper and Kafka): sudo docker run -d --name schema-registry -p 8081:8081 --link zookeeper:zookeeper --link kafka:kafka confluent/schema-registry ...