Skip to main content

Some potential errors building Hadoop on Windows (and their root cause and solution).

In my last post I described the process to successfully build Hadoop on a Windows machine. Before coming to a stable and reproducible procedure I faced some errors which messages don't highlight the real problem. Browsing the web I didn't found so much info about them, so I am going to share the solutions I found. In this list I am going to skip some errors due to missing execution of one or more steps of the overall build procedure (like those due to missing JDK or Maven, or missing paths in the PATH variable, or missing environment variables, etc.) that could be easily fixed.
I am still referring to the release 2.7.1 of Hadoop. So most part of the proposed solutions have been tested for this release only.

Building through an incompatible version of the .NET framework.
Error message:
LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt [C:\OpenSourcePrograms\Hadoop\hadoop-2.7.1-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.vcxproj]
Done Building Project "C:\OpenSourcePrograms\Hadoop\hadoop-2.7.1-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.vcxproj" (default targets) -
- FAILED.


Root cause: You are trying to build using .NET framework release 4.5. Hadoop 2.7.1 is incompatible with that .NET release.

Solution: Remove the .NET 4.5 from the building machine and use the release 4 (it comes with the Windows SDK 7.1).


Building with the latest version of Protocol Buffer.
Error message:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:34.957s
[INFO] Finished at: Tue Aug 04 10:25:16 BST 2015
[INFO] Final Memory: 63M/106M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.7.1:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -
> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command


Root cause:
Hadoop common 2.7.1 is incompatible with the latest version (2.6.1) of Google Protocol Buffers.

Solution:
Download and use the release 2.5.0 (Hadoop 2.7.1 Common builds successfully with this release only).

Windows SDK not installed successfully or wrong path for the MSBuild.exe to use.
Error message:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 45.046s
[INFO] Finished at: Tue Aug 04 11:34:29 BST 2015
[INFO] Final Memory: 66M/240M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (compile-ms-winutils) on project hadoop-common: Command execution failed. Process
exited with an error: 1 (Exit value: 1) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (compile-ms-winutils) on project hadoop-common: Command execution failed.
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:216)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:153)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:145)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:84)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:59)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBu
ild(LifecycleStarter.java:183)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl
eStarter.java:161)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:317)
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:152)
        at org.apache.maven.cli.MavenCli.execute(MavenCli.java:555)
        at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:214)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:158)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:94)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:55)
        at java.lang.reflect.Method.invoke(Method.java:619)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Laun
cher.java:290)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.jav
a:230)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(La
uncher.java:409)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:
352)
Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution fai
led.
        at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:303)
        at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(Default
BuildPluginManager.java:106)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:208)
        ... 19 more
Caused by: org.apache.commons.exec.ExecuteException: Process exited with an erro
r: 1 (Exit value: 1)
        at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecut
or.java:402)
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:
164)
        at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:750)

        at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:292)
        ... 21 more
[ERROR]
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn <goals> -rf :hadoop-common


Root cause:
This error happens when:
A) the Windows SDK installation didn't complete successfully or
B) the location of the MSBuild.exe in the system PATH variable is not the correct one to use. You could have more than one on a Windows machine.

Solution:
A) Repeat the Windows SDK 7.1. installation process and then make sure that everything went well.
B) Locate the proper MSBuild.exe to use and update the system PATH variable with the correct path.

Error compiling through the IBM JDK.
Error message:
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 19 source files to C:\OpenSourcePrograms\Hadoop\hadoop-2.7.1-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-registry\target\test-classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /C:/OpenSourcePrograms/Hadoop/hadoop-2.7.1-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java:[23,36] package com.sun.security.auth.module does not exist
[ERROR] /C:/OpenSourcePrograms/Hadoop/hadoop-2.7.1-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java:[138,11] cannot find symbol
  symbol:   class Krb5LoginModule
  location: class org.apache.hadoop.registry.secure.TestSecureLogins
[ERROR] /C:/OpenSourcePrograms/Hadoop/hadoop-2.7.1-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java:[138,49] cannot find symbol
  symbol:   class Krb5LoginModule
  location: class org.apache.hadoop.registry.secure.TestSecureLogins
[INFO] 3 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------


Root cause:
This error happens only when the JDK used to perform the compilation is IBM. The Hadoop source code is totally compatible with any JDK 7. No issues found using the IBM JDK or the Open JDK so far. But one of the JUnit TestCases (TestSecureLogins, contained in the Hadoop Yarn project test package org.apache.hadoop.registry.secure), has a dependency from the Oracle implementation of the Krb5LoginModule class (com.sun.security.auth.module.Krb5LoginModule), not present in the IBM JDK. Even if you build Hadoop through Maven with the option -DskipTests, the unit tests are compiled any way and the error above happens.

Solution:
Open the %HADOOP_HOME%/hadoop-2.7.1-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java class with any text editor and replace the import of the Oracle Krb5LoginModule implementation
import com.sun.security.auth.module.Krb5LoginModule;
with the import of the IBM Krb5LoginModule implementation:
import com.ibm.security.auth.module.Krb5LoginModule;
Save the change and restart the build.

Comments

Popular posts from this blog

Streamsets Data Collector log shipping and analysis using ElasticSearch, Kibana and... the Streamsets Data Collector

One common use case scenario for the Streamsets Data Collector (SDC) is the log shipping to some system, like ElasticSearch, for real-time analysis. To build a pipeline for this particular purpose in SDC is really simple and fast and doesn't require coding at all. For this quick tutorial I will use the SDC logs as example. The log data will be shipped to Elasticsearch and then visualized through a Kibana dashboard. Basic knowledge of SDC, Elasticsearch and Kibana is required for a better understanding of this post. These are the releases I am referring to for each system involved in this tutorial: JDK 8 Streamsets Data Collector 1.4.0 ElasticSearch 2.3.3 Kibana 4.5.1 Elasticsearch and Kibana installation You should have your Elasticsearch cluster installed and configured and a Kibana instance pointing to that cluster in order to go on with this tutorial. Please refer to the official documentation for these two products in order to complete their installation (if you do

Exporting InfluxDB data to a CVS file

Sometimes you would need to export a sample of the data from an InfluxDB table to a CSV file (for example to allow a data scientist to do some offline analysis using a tool like Jupyter, Zeppelin or Spark Notebook). It is possible to perform this operation through the influx command line client. This is the general syntax: sudo /usr/bin/influx -database '<database_name>' -host '<hostname>' -username '<username>'  -password '<password>' -execute 'select_statement' -format '<format>' > <file_path>/<file_name>.csv where the format could be csv , json or column . Example: sudo /usr/bin/influx -database 'telegraf' -host 'localhost' -username 'admin'  -password '123456789' -execute 'select * from mem' -format 'csv' > /home/googlielmo/influxdb-export/mem-export.csv

Using Rapids cuDF in a Colab notebook

During last Spark+AI Summit Europe 2019 I had a chance to attend a talk from Miguel Martinez  who was presenting Rapids , the new Open Source framework from NVIDIA for GPU accelerated end-to-end Data Science and Analytics. Fig. 1 - Overview of the Rapids eco-system Rapids is a suite of Open Source libraries: cuDF cuML cuGraph cuXFilter I enjoied the presentation and liked the idea of this initiative, so I wanted to start playing with the Rapids libraries in Python on Colab , starting from cuDF, but the first attempt came with an issue that I eventually solved. So in this post I am going to share how I fixed it, with the hope it would be useful to someone else running into the same blocker. I am assuming here you are already familiar with Google Colab. I am using Python 3.x as Python 2 isn't supported by Rapids. Once you have created a new notebook in Colab, you need to check if the runtime for it is set to use Python 3 and uses a GPU as hardware accelerator. You