Skip to main content

Dance into the Groovy

Groovy (http://groovy.codehaus.org/) is an Object Oriented scripting language for the Java platform. It is dinamically compiled to bytecode for the Java Virtual Machine. It can use and interoperate with any Java class and library (any, not only the ones provided by the JDK) and it has a Java-like syntax: this way its learning curve for a Java depeloper is almost close to zero. Groovy is very powerful and can help in different areas: in some future post I will show and explain better its benefits. In this one I want to show you just a concrete example of use from me in the past. Many times we had to deal with the move of hundreds of Oracle schemas from an envirnoment to another (dev, test, QA, production) or to migrate them from an Oracle release to another. Every time the DBAs regularly forgot to set up the DB links for the new environment. This generated errors that couldn't often be identified quickly. In order to avoid this I wrote a set of Groovy scripts to check the sanity of the DB links that could be automated after any Oracle schema or instance migration. This is a simple example of those scripts (the following code is an oversimplification of the original ones of course). Please read the comment inside the code: they explain everything.

import groovy.sql.Sql
import groovy.sql.Sql.AbstractQueryCommand
import java.sql.ResultSet

def dbLinkHostList = []

// Get a new Groovy Sql instance using the connection parameters for your Oracle schema
def sql = Sql.newInstance("jdbc:oracle:thin:@INSTANCE:1756:SID", "SCHEMA_OWNER",
                      "password", "oracle.jdbc.driver.OracleDriver")



// Execute a query in the system table user_db_links. This query gets all the DB links for your 
// schema
 sql.eachRow("select * from user_db_links") {
    // Put any found DB link name into an array
    dbLinkHostList.add("${it.db_link}");

}

sql.close();



def dbLinksCount = dbLinkHostList.size();

// Cycle between the DB links
for(i in 0..(dbLinksCount - 1)) {
    log.info("dbLink: " + dbLinkHostList.get(i))
    sql = Sql.newInstance("jdbc:oracle:thin:@INSTANCE:1756:SID", "SCHEMA_OWNER",
                      "password", "oracle.jdbc.driver.OracleDriver")

    // Execute a simple query for the DB link
    def sqlConnectionTest = "select rownum id from dual@${dbLinkHostList.get(i)}"
    def AbstractQueryCommand q = sql.createQueryCommand("${sqlConnectionTest}");
     try {
         ResultSet rs = q.execute();

         // Print a message if the DB link is OK for the new environment
         log.info("DB link connection is OK");

         // Don't forget to close the ResultSet
         rs.close();
     } catch(Exception e) {

         // Manage the exception for any broken DB link
         log.info("exception: " + e.getMessage());
     }
     finally {

         // Don't forget to close every DB object used (You don't want to leave open cursors in the 
        // DB, isn't it? :)
         q.closeResources();
         sql.close();
     }
}


This example requires just Groovy and the Oracle JDBC driver in the classpath. A simple solution to solve an annoying problem and save a lot of precious time.

I used a lot of Groovy to wrote useful scripts to automate load testing or functional testing through SoapUI, but this is another story (next...).

Comments

Popular posts from this blog

Streamsets Data Collector log shipping and analysis using ElasticSearch, Kibana and... the Streamsets Data Collector

One common use case scenario for the Streamsets Data Collector (SDC) is the log shipping to some system, like ElasticSearch, for real-time analysis. To build a pipeline for this particular purpose in SDC is really simple and fast and doesn't require coding at all. For this quick tutorial I will use the SDC logs as example. The log data will be shipped to Elasticsearch and then visualized through a Kibana dashboard. Basic knowledge of SDC, Elasticsearch and Kibana is required for a better understanding of this post. These are the releases I am referring to for each system involved in this tutorial: JDK 8 Streamsets Data Collector 1.4.0 ElasticSearch 2.3.3 Kibana 4.5.1 Elasticsearch and Kibana installation You should have your Elasticsearch cluster installed and configured and a Kibana instance pointing to that cluster in order to go on with this tutorial. Please refer to the official documentation for these two products in order to complete their installation (if you do

Exporting InfluxDB data to a CVS file

Sometimes you would need to export a sample of the data from an InfluxDB table to a CSV file (for example to allow a data scientist to do some offline analysis using a tool like Jupyter, Zeppelin or Spark Notebook). It is possible to perform this operation through the influx command line client. This is the general syntax: sudo /usr/bin/influx -database '<database_name>' -host '<hostname>' -username '<username>'  -password '<password>' -execute 'select_statement' -format '<format>' > <file_path>/<file_name>.csv where the format could be csv , json or column . Example: sudo /usr/bin/influx -database 'telegraf' -host 'localhost' -username 'admin'  -password '123456789' -execute 'select * from mem' -format 'csv' > /home/googlielmo/influxdb-export/mem-export.csv

Using Rapids cuDF in a Colab notebook

During last Spark+AI Summit Europe 2019 I had a chance to attend a talk from Miguel Martinez  who was presenting Rapids , the new Open Source framework from NVIDIA for GPU accelerated end-to-end Data Science and Analytics. Fig. 1 - Overview of the Rapids eco-system Rapids is a suite of Open Source libraries: cuDF cuML cuGraph cuXFilter I enjoied the presentation and liked the idea of this initiative, so I wanted to start playing with the Rapids libraries in Python on Colab , starting from cuDF, but the first attempt came with an issue that I eventually solved. So in this post I am going to share how I fixed it, with the hope it would be useful to someone else running into the same blocker. I am assuming here you are already familiar with Google Colab. I am using Python 3.x as Python 2 isn't supported by Rapids. Once you have created a new notebook in Colab, you need to check if the runtime for it is set to use Python 3 and uses a GPU as hardware accelerator. You