Skip to main content

Posts

Showing posts from November, 2015

IllegalStateException from Mongo storage plugin in Apache Drill

Apache Drill ( https://drill.apache.org/ ) is an Open Source framework that supports data-intensive distributed applications for interactive analysis of large scale datasets. It supports several kinds of filesystems and NoSQL databases, including HDFS and MongoDB. In this short post I want to show an exception you could deal with if your Drill installation has the MongoDB storage plugin enabled. If suddenly, after more than 24 of uptime, any SQL query (not only those against a MongoDB database) issued through Drill throw the following exception: SYSTEM ERROR: IllegalStateException: state should be: open [Error Id: 57a02508-1920-4360-a111-c2a55a7af15c on hostname:31010]] this should be related to a connection cache that expires and isn't automatically reset. This issue affects in particular the latest release (1.1.0) of Drill and it is still marked as unresolved at the time of this post writing ( https://issues.apache.org/jira/browse/DRILL-3522 ), but there is a patch available...

MRUnit Tutorial

Apache MRUnit ( https://mrunit.apache.org/ ) is an Open Source library that allows unit-testing for Hadoop Mappers, Reducers, and MapReduce programs. It provides a convenient integration between MapReduce and standard testing libraries such as JUnit and Mockito and helps (providing a set of interfaces and test harnesses) bridging the gap between MapReduce programs and those traditional libraries. It doesn't replace JUnit, but works on top of it. Before reading further, please be aware that knowledge of Hadoop MapReduce and JUnit is required for a better understanding of this post. The three core classes of MRUnit are the following: • MapDriver : the driver class responsible for calling the Mapper’s map() method. • ReducerDriver : the driver class responsible for calling the Reducer’s reduce() method. • MapReduceDriver : the combined MapReduce driver responsible for calling the Mapper’s map() method first, followed by an in-memory Shuffle phase. At the end of this phase the ...

Living with uncertainty

Just a quick note about Scrum (and Agile in general) on a concept that is often ignored. Scrum is intended for those kinds of work that defined processes have often failed to manage: uncertain requirements combined with unpredictable technology implementation risks (this distinction is not a blasphemy in the Agile context). These conditions usually exist during a new product (in particular software products) development: no matter how carefully you plan the future, it can never be more than a dream, unless you adjust your plan every day. So don't be scared by uncertainty, be ready to live with it and work hard to reduce it. This is what makes the difference between a good member of an Agile Team and a bad one: the good one manages and doesn't ignore uncertainty. This is true at each level of any self proclaimed agile company.