Skip to main content

Evaluating Pinpoint APM (Part 2)

This second post of the Pinpoint series covers the configuration of the HBase database where the monitoring data are written by the collector and from which they are read by the web UI.
I did the first evaluation of Pinpoint on a MS Windows machine, so here I am going to cover some specific installation details for this OS family. For initial evaluation purposes a standalone HBase server (which runs all daemons within a single JVM) is enough.

Database installation

Here I am referring to the latest stable release (1.2.4) of HBase available at the time this post is being written. This release supports both Java 7 and Java 8: I am referring to Java 8 here. Cygwin isn't going to be used for this installation purposes.
Of course you start downloading the tarball with the HBase binaries and then unpack its content.
Rename the hbase-1.2.4 directory to hbase.
Set up the JAVA_HOME variable to the JRE to use (if you don't have already done it in this installation machine).
Edit the %HBASE_HOME%\conf\hbase-site.xml configuration file in order to set the directories in the local filesystem where HBase and ZooKeeper write data:
    <configuration>
      <property>
        <name>hbase.rootdir</name>
        <value>file:///C:/Users/hbaseuser/hbase</value>
      </property>
      <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/C:/Users/hbaseuser/zookeeper</value>
      </property>
    </configuration>

No need to create those directories preliminarily: HBase will do it at the first start.
Download the Winutils executable from its GitHub repository and then save it in a subfolder named bin of a local directory. Then edit the %HBASE_HOME%\conf\hbase-env.cmd file setting the HADOOP_HOME environment variable with the Winutils home directory like in the example below:
    set HADOOP_HOME=C:\DevelopmentTools\WinUtils
HBase needs ZooKeeper to run. You can set HBase to start its own ZooKeeper instance simply decommenting the following line in the %HBASE_HOME%\conf\hbase-env.cmd file:
    rem set HBASE_MANAGES_ZK=true
Now you're ready to start HBase. From a command prompt execute the following command:
    %HBASE_HOME%\bin\start-hbase.cmd
To test that the database is running fine you can connect to its web UI available at the following URL:
    http://localhost:16010
or start a HBase shell session through the following command:
    HBASE_HOME%\bin\hbase shell
   

Configuration for Pinpoint

Now that the HBase database is running you can create the schema for Pinpoint. You need to specify in the init script that you're going to use an existing HBase instance. So you need to edit the %PINPOINT_HOME%\quickstart\bin\init-hbase.cmd file setting the QUICKSTART_HBASE_PATH with your external HBase home path like in the example below:
    set QUICKSTART_HBASE_PATH=C:\DevelopmentTools\hbase
and then commenting the line
    set QUICKSTART_HBASE_PATH=%QUICKSTART_BASE%\hbase\hbase
Save the changes and execute the script. The execution will last some minutes (depending on your machine resources), so be patient and grab a coffee or do some stretching exercises while waiting for it to be completed.
At the end the following tables should have been created in the database:
  •     AgentEvent
  •     AgentInfo
  •     AgentLifeCycle
  •     AgentStat
  •     AgentStatV2
  •     ApiMetaData
  •     ApplicationIndex
  •     ApplicationMapStatisticsCallee_Ver2
  •     ApplicationMapStatisticsCaller_Ver2
  •     ApplicationMapStatisticsSelf_Ver2
  •     ApplicationTraceIndex
  •     HostApplicationMap_Ver2
  •     SqlMetaData_Ver2
  •     StringMetaData
  •     TraceV2
  •     Traces

What's next

In the next post of this series we are going to learn how to start the collector and the web UI, test Pinpoint using the demo web application which is part of the quickstart bundle and understand how to setup the agent to profile standalone and web Java applications.

Comments

Popular posts from this blog

Streamsets Data Collector log shipping and analysis using ElasticSearch, Kibana and... the Streamsets Data Collector

One common use case scenario for the Streamsets Data Collector (SDC) is the log shipping to some system, like ElasticSearch, for real-time analysis. To build a pipeline for this particular purpose in SDC is really simple and fast and doesn't require coding at all. For this quick tutorial I will use the SDC logs as example. The log data will be shipped to Elasticsearch and then visualized through a Kibana dashboard. Basic knowledge of SDC, Elasticsearch and Kibana is required for a better understanding of this post. These are the releases I am referring to for each system involved in this tutorial: JDK 8 Streamsets Data Collector 1.4.0 ElasticSearch 2.3.3 Kibana 4.5.1 Elasticsearch and Kibana installation You should have your Elasticsearch cluster installed and configured and a Kibana instance pointing to that cluster in order to go on with this tutorial. Please refer to the official documentation for these two products in order to complete their installation (if you do

Exporting InfluxDB data to a CVS file

Sometimes you would need to export a sample of the data from an InfluxDB table to a CSV file (for example to allow a data scientist to do some offline analysis using a tool like Jupyter, Zeppelin or Spark Notebook). It is possible to perform this operation through the influx command line client. This is the general syntax: sudo /usr/bin/influx -database '<database_name>' -host '<hostname>' -username '<username>'  -password '<password>' -execute 'select_statement' -format '<format>' > <file_path>/<file_name>.csv where the format could be csv , json or column . Example: sudo /usr/bin/influx -database 'telegraf' -host 'localhost' -username 'admin'  -password '123456789' -execute 'select * from mem' -format 'csv' > /home/googlielmo/influxdb-export/mem-export.csv

Using Rapids cuDF in a Colab notebook

During last Spark+AI Summit Europe 2019 I had a chance to attend a talk from Miguel Martinez  who was presenting Rapids , the new Open Source framework from NVIDIA for GPU accelerated end-to-end Data Science and Analytics. Fig. 1 - Overview of the Rapids eco-system Rapids is a suite of Open Source libraries: cuDF cuML cuGraph cuXFilter I enjoied the presentation and liked the idea of this initiative, so I wanted to start playing with the Rapids libraries in Python on Colab , starting from cuDF, but the first attempt came with an issue that I eventually solved. So in this post I am going to share how I fixed it, with the hope it would be useful to someone else running into the same blocker. I am assuming here you are already familiar with Google Colab. I am using Python 3.x as Python 2 isn't supported by Rapids. Once you have created a new notebook in Colab, you need to check if the runtime for it is set to use Python 3 and uses a GPU as hardware accelerator. You