Few days ago Pat Patterson published an excellent article on DZone about Evolving Avro Schemas With Apache Kafka and StreamSets Data Collector. I recommend reading this interesting article. I followed this tutorial and today I want to share the details on how I quickly setup the environment for this purpose, just in case you should be interested on doing the same. I did it on a Linux Red Hat Server 7 (but the steps are the same for any other Linux distro) and using only images available in the Docker Hub.
First start a Zookeeper node (which is required by Kafka):
sudo docker run --name some-zookeeper --restart always -d zookeeper
and then a Kafka broker, linking the container to that for Zookeeper:
sudo docker run -d --name kafka --link zookeeper:zookeeper ches/kafka
Then start the Confluent Schema Registry (linking it to Zookeeper and Kafka):
sudo docker run -d --name schema-registry -p 8081:8081 --link zookeeper:zookeeper --link kafka:kafka confluent/schema-registry
and the REST proxy for it:
sudo docker run -d --name rest-proxy -p 8082:8082 --link zookeeper:zookeeper --link kafka:kafka --link schema-registry:schema-registry confluent/rest-proxy
Start an instance of the Streamsets Data Collector:
sudo docker run --restart on-failure -p 18630:18630 -d --name streamsets-dc streamsets/datacollector
Finally you can do an optional step in order to make more user friendly (compared to using the CSR APIs) the registration/update of Avro schema in CSR: start the OS CSR UI provided by Landoop:
sudo docker run -d --name schema-registry-ui -p 8000:8000 -e "SCHEMAREGISTRY_URL=http://<csr_host>:8081" -e "PROXY=true" landoop/schema-registry-ui
connecting it to your CSR instance.
You can create a topic in Kafka executing the following commands:
export ZOOKEEPER_IP=$(sudo docker inspect --format '{{ .NetworkSettings.IPAddress }}' zookeeper)
sudo docker run --rm ches/kafka kafka-topics.sh --create --zookeeper $ZOOKEEPER_IP:2181 --replication-factor 1 --partitions 1 --topic csrtest
The environment is ready to play with and to be used to follow Pat's tutorial.
First start a Zookeeper node (which is required by Kafka):
sudo docker run --name some-zookeeper --restart always -d zookeeper
and then a Kafka broker, linking the container to that for Zookeeper:
sudo docker run -d --name kafka --link zookeeper:zookeeper ches/kafka
Then start the Confluent Schema Registry (linking it to Zookeeper and Kafka):
sudo docker run -d --name schema-registry -p 8081:8081 --link zookeeper:zookeeper --link kafka:kafka confluent/schema-registry
and the REST proxy for it:
sudo docker run -d --name rest-proxy -p 8082:8082 --link zookeeper:zookeeper --link kafka:kafka --link schema-registry:schema-registry confluent/rest-proxy
Start an instance of the Streamsets Data Collector:
sudo docker run --restart on-failure -p 18630:18630 -d --name streamsets-dc streamsets/datacollector
Finally you can do an optional step in order to make more user friendly (compared to using the CSR APIs) the registration/update of Avro schema in CSR: start the OS CSR UI provided by Landoop:
sudo docker run -d --name schema-registry-ui -p 8000:8000 -e "SCHEMAREGISTRY_URL=http://<csr_host>:8081" -e "PROXY=true" landoop/schema-registry-ui
connecting it to your CSR instance.
You can create a topic in Kafka executing the following commands:
export ZOOKEEPER_IP=$(sudo docker inspect --format '{{ .NetworkSettings.IPAddress }}' zookeeper)
sudo docker run --rm ches/kafka kafka-topics.sh --create --zookeeper $ZOOKEEPER_IP:2181 --replication-factor 1 --partitions 1 --topic csrtest
The environment is ready to play with and to be used to follow Pat's tutorial.
Comments
Post a Comment