StreamSets Data Collector ( https://streamsets.com/product/ ) is an Open Source lightweight and powerful engine that streams data in real time. It allows to configure data flows as pipelines through a web UI in few minutes. Among its many features, it makes possible to view real-time statistics and inspect data as it passes through the pipeline. In the first part of this series I am going to show the installation steps to run the Data Collector manually. I am referring to the release 1.2.1.0. The latest one (1.2.2.0) comes with a bug that prevents it to start (I have opened a ticket in the official Jira for this product ( https://issues.streamsets.com/browse/SDC-2657 ), but it is still unresolved at the time this post is written). The prerequisites for the installation are: OS: RedHat Enterprise Linux 6 or 7 or CentOS 6 or 7 or Ubuntu 14.04 or Mac OS X. Java: Oracle or IBM JDK 7+. And now the installation steps: - Download the full StreamSets Data Collector tar...
Sharing thoughts and tips on Python, Java, Scala, Open Source, DevOps, Data Science, ML/DL/AI.