Skip to main content

The Kafka Series (part 2): single node-single broker cluster installation

In the second part of this series I will describe the steps to install a Kafka single node-single broker cluster on a Linux machine. Here I am referring to the latest Kafka stable version (at the time of writing this post), 0.9.0.1, Scala 2.11.

Prerequisites
The only prerequisite needed is a JDK 7+.

Installation
- Move to the opt folder of your system
   cd /opt
  and then download the binaries of the latest release there:
   wget http://www.us.apache.org/dist/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz
- Extract the archive content:
   tar xzf kafka_2.11-0.9.0.1.tgz
- Create the KAFKA_HOME variable:
   echo -e "export KAFKA_HOME=/opt/kafka_2.11-0.9.0.1" >> /root/.bash_profile
- Add the Kafka bin folder to the PATH:
   echo -e "export PATH=$PATH:$KAFKA_HOME/bin" >> /root/.bash_profile
- Reload the bash profile for the user:
   source /root/.bash_profile


Starting the server
 - In order for the Kafka server to work properly you need to start ZooKeeper (https://zookeeper.apache.org/) first. Kafka comes with its own Zookeeper server. So a separate ZooKeeper installation isn't mandatory. Please note that if your default JDK is the IBM one you need to replace the JVM loggc option with the verbosegclog one in the $KAFKA_HOME/bin/zookeeper-server-start.sh script.
- Start the Zookeper server (in the example below I am using the default ZooKeeper's property file):  
   $KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
- If your default JDK is the IBM one you have to replace the JVM loggc option with the verbosegclog one in the $KAFKA_HOME/bin/kafka-server-start.sh script as well.
- Start the Kafka server broker (in the example below I am using the default Kafka broker's property file):
   $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties

Testing the installation
And now the steps to test that everything is working properly.
- Create a topic:    
   $KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kafkatesting
- Check that the new topic is in the topic list:
   $KAFKA_HOME/bin/kafka-topics.sh --list --zookeeper localhost:2181
- Run a producer for the new topic using the provided script:
   $KAFKA_HOME/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kafkatesting
  and then type from the shell few messages for that topic to be sent to the server.
- From a different shell run a consumer for the new topic using the provided script:
   $KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic kafkatesting --from-beginning
  You should see the messages generated by the producer printed out to the consumer shell. 
   
What's next?
In Part 3 of this series you will learn how to implement a producer using the Kafka Java APIs.

Comments

Popular posts from this blog

Exporting InfluxDB data to a CVS file

Sometimes you would need to export a sample of the data from an InfluxDB table to a CSV file (for example to allow a data scientist to do some offline analysis using a tool like Jupyter, Zeppelin or Spark Notebook). It is possible to perform this operation through the influx command line client. This is the general syntax: sudo /usr/bin/influx -database '<database_name>' -host '<hostname>' -username '<username>'  -password '<password>' -execute 'select_statement' -format '<format>' > <file_path>/<file_name>.csv where the format could be csv , json or column . Example: sudo /usr/bin/influx -database 'telegraf' -host 'localhost' -username 'admin'  -password '123456789' -execute 'select * from mem' -format 'csv' > /home/googlielmo/influxdb-export/mem-export.csv

jOOQ: code generation in Eclipse

jOOQ allows code generation from a database schema through ANT tasks, Maven and shell command tools. But if you're working with Eclipse it's easier to create a new Run Configuration to perform this operation. First of all you have to write the usual XML configuration file for the code generation starting from the database: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <configuration xmlns="http://www.jooq.org/xsd/jooq-codegen-2.0.4.xsd">   <jdbc>     <driver>oracle.jdbc.driver.OracleDriver</driver>     <url>jdbc:oracle:thin:@dbhost:1700:DBSID</url>     <user>DB_FTRS</user>     <password>password</password>   </jdbc>   <generator>     <name>org.jooq.util.DefaultGenerator</name>     <database>       <name>org.jooq.util.oracle.OracleDatabase</name>     ...

Using Rapids cuDF in a Colab notebook

During last Spark+AI Summit Europe 2019 I had a chance to attend a talk from Miguel Martinez  who was presenting Rapids , the new Open Source framework from NVIDIA for GPU accelerated end-to-end Data Science and Analytics. Fig. 1 - Overview of the Rapids eco-system Rapids is a suite of Open Source libraries: cuDF cuML cuGraph cuXFilter I enjoied the presentation and liked the idea of this initiative, so I wanted to start playing with the Rapids libraries in Python on Colab , starting from cuDF, but the first attempt came with an issue that I eventually solved. So in this post I am going to share how I fixed it, with the hope it would be useful to someone else running into the same blocker. I am assuming here you are already familiar with Google Colab. I am using Python 3.x as Python 2 isn't supported by Rapids. Once you have created a new notebook in Colab, you need to check if the runtime for it is set to use Python 3 and uses a GPU as hardware accelerator. You...