Skip to main content

Setting up a quick dev environment for Kafka, CSR and SDC

Few days ago Pat Patterson published an excellent article on DZone about Evolving Avro Schemas With Apache Kafka and StreamSets Data Collector. I recommend reading this interesting article. I followed this tutorial and today I want to share the details on how I quickly setup the environment for this purpose, just in case you should be interested on doing the same. I did it on a Linux Red Hat Server 7 (but the steps are the same for any other Linux distro) and using only images available in the Docker Hub.
First start a Zookeeper node (which is required by Kafka):

sudo docker run --name some-zookeeper --restart always -d zookeeper

and then a Kafka broker, linking the container to that for Zookeeper:

sudo docker run -d --name kafka --link zookeeper:zookeeper ches/kafka

Then start the Confluent Schema Registry (linking it to Zookeeper and Kafka):

sudo docker run -d --name schema-registry -p 8081:8081 --link zookeeper:zookeeper --link kafka:kafka confluent/schema-registry

and the REST proxy for it:

sudo docker run -d --name rest-proxy -p 8082:8082 --link zookeeper:zookeeper --link kafka:kafka --link schema-registry:schema-registry confluent/rest-proxy

Start an instance of the Streamsets Data Collector:

sudo docker run --restart on-failure -p 18630:18630 -d --name streamsets-dc streamsets/datacollector

Finally you can do an optional step in order to make more user friendly (compared to using the CSR APIs) the registration/update of Avro schema in CSR: start the OS CSR UI provided by Landoop:

sudo docker run -d --name schema-registry-ui -p 8000:8000 -e "SCHEMAREGISTRY_URL=http://<csr_host>:8081" -e "PROXY=true" landoop/schema-registry-ui

connecting it to your CSR instance.
You can create a topic in Kafka executing the following commands:

export ZOOKEEPER_IP=$(sudo docker inspect --format '{{ .NetworkSettings.IPAddress }}' zookeeper) 
sudo docker run --rm ches/kafka kafka-topics.sh --create --zookeeper $ZOOKEEPER_IP:2181 --replication-factor 1 --partitions 1 --topic csrtest

The environment is ready to play with and to be used to follow Pat's tutorial.

Comments

Popular posts from this blog

jOOQ: code generation in Eclipse

jOOQ allows code generation from a database schema through ANT tasks, Maven and shell command tools. But if you're working with Eclipse it's easier to create a new Run Configuration to perform this operation. First of all you have to write the usual XML configuration file for the code generation starting from the database: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <configuration xmlns="http://www.jooq.org/xsd/jooq-codegen-2.0.4.xsd">   <jdbc>     <driver>oracle.jdbc.driver.OracleDriver</driver>     <url>jdbc:oracle:thin:@dbhost:1700:DBSID</url>     <user>DB_FTRS</user>     <password>password</password>   </jdbc>   <generator>     <name>org.jooq.util.DefaultGenerator</name>     <database>       <name>org.jooq.util.oracle.OracleDatabase</name>     ...

Turning Python Scripts into Working Web Apps Quickly with Streamlit

 I just realized that I am using Streamlit since almost one year now, posted about in Twitter or LinkedIn several times, but never wrote a blog post about it before. Communication in Data Science and Machine Learning is the key. Being able to showcase work in progress and share results with the business makes the difference. Verbal and non-verbal communication skills are important. Having some tool that could support you in this kind of conversation with a mixed audience that couldn't have a technical background or would like to hear in terms of results and business value would be of great help. I found that Streamlit fits well this scenario. Streamlit is an Open Source (Apache License 2.0) Python framework that turns data or ML scripts into shareable web apps in minutes (no kidding). Python only: no front‑end experience required. To start with Streamlit, just install it through pip (it is available in Anaconda too): pip install streamlit and you are ready to execute the working de...

TagUI: an Excellent Open Source Option for RPA - Introduction

 Photo by Dinu J Nair on Unsplash Today I want to introduce  TagUI , an RPA (Robotic Process Automation) Open Source tool I am using to automate test scenarios for web applications. It is developed and maintained by the AI Singapore national programme. It allows writing flows to automate repetitive tasks, such as regression testing of web applications. Flows are written in natural language : English and other 20 languages are currently supported. Works on Windows, Linux and macOS. The TagUI official documentation can be found  here . The tool doesn't require installation: just go the official GitHub repository and download the archive for your specific OS (ZIP for Windows, tar.gz for Linux or macOS). After the download is completed, unpack its content in the local hard drive. The executable to use is named  tagui  (.cmd in Windows, .sh for other OS) and it is located into the  <destination_folder>/tagui/src  directory. In order to ...