Skip to main content

Some potential errors building Hadoop on Windows (and their root cause and solution).

In my last post I described the process to successfully build Hadoop on a Windows machine. Before coming to a stable and reproducible procedure I faced some errors which messages don't highlight the real problem. Browsing the web I didn't found so much info about them, so I am going to share the solutions I found. In this list I am going to skip some errors due to missing execution of one or more steps of the overall build procedure (like those due to missing JDK or Maven, or missing paths in the PATH variable, or missing environment variables, etc.) that could be easily fixed.
I am still referring to the release 2.7.1 of Hadoop. So most part of the proposed solutions have been tested for this release only.

Building through an incompatible version of the .NET framework.
Error message:
LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt [C:\OpenSourcePrograms\Hadoop\hadoop-2.7.1-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.vcxproj]
Done Building Project "C:\OpenSourcePrograms\Hadoop\hadoop-2.7.1-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.vcxproj" (default targets) -
- FAILED.


Root cause: You are trying to build using .NET framework release 4.5. Hadoop 2.7.1 is incompatible with that .NET release.

Solution: Remove the .NET 4.5 from the building machine and use the release 4 (it comes with the Windows SDK 7.1).


Building with the latest version of Protocol Buffer.
Error message:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:34.957s
[INFO] Finished at: Tue Aug 04 10:25:16 BST 2015
[INFO] Final Memory: 63M/106M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.7.1:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -
> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command


Root cause:
Hadoop common 2.7.1 is incompatible with the latest version (2.6.1) of Google Protocol Buffers.

Solution:
Download and use the release 2.5.0 (Hadoop 2.7.1 Common builds successfully with this release only).

Windows SDK not installed successfully or wrong path for the MSBuild.exe to use.
Error message:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 45.046s
[INFO] Finished at: Tue Aug 04 11:34:29 BST 2015
[INFO] Final Memory: 66M/240M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (compile-ms-winutils) on project hadoop-common: Command execution failed. Process
exited with an error: 1 (Exit value: 1) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (compile-ms-winutils) on project hadoop-common: Command execution failed.
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:216)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:153)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:145)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:84)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:59)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBu
ild(LifecycleStarter.java:183)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl
eStarter.java:161)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:317)
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:152)
        at org.apache.maven.cli.MavenCli.execute(MavenCli.java:555)
        at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:214)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:158)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:94)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:55)
        at java.lang.reflect.Method.invoke(Method.java:619)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Laun
cher.java:290)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.jav
a:230)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(La
uncher.java:409)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:
352)
Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution fai
led.
        at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:303)
        at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(Default
BuildPluginManager.java:106)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:208)
        ... 19 more
Caused by: org.apache.commons.exec.ExecuteException: Process exited with an erro
r: 1 (Exit value: 1)
        at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecut
or.java:402)
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:
164)
        at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:750)

        at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:292)
        ... 21 more
[ERROR]
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn <goals> -rf :hadoop-common


Root cause:
This error happens when:
A) the Windows SDK installation didn't complete successfully or
B) the location of the MSBuild.exe in the system PATH variable is not the correct one to use. You could have more than one on a Windows machine.

Solution:
A) Repeat the Windows SDK 7.1. installation process and then make sure that everything went well.
B) Locate the proper MSBuild.exe to use and update the system PATH variable with the correct path.

Error compiling through the IBM JDK.
Error message:
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 19 source files to C:\OpenSourcePrograms\Hadoop\hadoop-2.7.1-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-registry\target\test-classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /C:/OpenSourcePrograms/Hadoop/hadoop-2.7.1-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java:[23,36] package com.sun.security.auth.module does not exist
[ERROR] /C:/OpenSourcePrograms/Hadoop/hadoop-2.7.1-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java:[138,11] cannot find symbol
  symbol:   class Krb5LoginModule
  location: class org.apache.hadoop.registry.secure.TestSecureLogins
[ERROR] /C:/OpenSourcePrograms/Hadoop/hadoop-2.7.1-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java:[138,49] cannot find symbol
  symbol:   class Krb5LoginModule
  location: class org.apache.hadoop.registry.secure.TestSecureLogins
[INFO] 3 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------


Root cause:
This error happens only when the JDK used to perform the compilation is IBM. The Hadoop source code is totally compatible with any JDK 7. No issues found using the IBM JDK or the Open JDK so far. But one of the JUnit TestCases (TestSecureLogins, contained in the Hadoop Yarn project test package org.apache.hadoop.registry.secure), has a dependency from the Oracle implementation of the Krb5LoginModule class (com.sun.security.auth.module.Krb5LoginModule), not present in the IBM JDK. Even if you build Hadoop through Maven with the option -DskipTests, the unit tests are compiled any way and the error above happens.

Solution:
Open the %HADOOP_HOME%/hadoop-2.7.1-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java class with any text editor and replace the import of the Oracle Krb5LoginModule implementation
import com.sun.security.auth.module.Krb5LoginModule;
with the import of the IBM Krb5LoginModule implementation:
import com.ibm.security.auth.module.Krb5LoginModule;
Save the change and restart the build.

Comments

Popular posts from this blog

Turning Python Scripts into Working Web Apps Quickly with Streamlit

 I just realized that I am using Streamlit since almost one year now, posted about in Twitter or LinkedIn several times, but never wrote a blog post about it before. Communication in Data Science and Machine Learning is the key. Being able to showcase work in progress and share results with the business makes the difference. Verbal and non-verbal communication skills are important. Having some tool that could support you in this kind of conversation with a mixed audience that couldn't have a technical background or would like to hear in terms of results and business value would be of great help. I found that Streamlit fits well this scenario. Streamlit is an Open Source (Apache License 2.0) Python framework that turns data or ML scripts into shareable web apps in minutes (no kidding). Python only: no front‑end experience required. To start with Streamlit, just install it through pip (it is available in Anaconda too): pip install streamlit and you are ready to execute the working de...

Load testing MongoDB using JMeter

Apache JMeter ( http://jmeter.apache.org/ ) added support for MongoDB since its 2.10 release. In this post I am referring to the latest JMeter release (2.13). A preliminary JMeter setup is needed before starting your first test plan for MongoDB. It uses Groovy as scripting reference language, so Groovy needs to be set up for our favorite load testing tool. Follow these steps to complete the set up: Download Groovy from the official website ( http://www.groovy-lang.org/download.html ). In this post I am referring to the Groovy release 2.4.4, but using later versions is fine. Copy the groovy-all-2.4.4.jar to the $JMETER_HOME/lib folder. Restart JMeter if it was running while adding the Groovy JAR file. Now you can start creating a test plan for MongoDB load testing. From the UI select the MongoDB template ( File -> Templates... ). The new test plan has a MongoDB Source Config element. Here you have to setup the connection details for the database to be tested: The Threa...

Evaluating Pinpoint APM (Part 1)

I started a journey evaluating Open Source alternatives to commercial New Relic and AppDynamics tools to check if some is really ready to be used in a production environment. One cross-platform Application Performance Management (APM) tool that particularly caught my attention is Pinpoint . The current release supports mostly Java applications and JEE application servers and provides support also for the most popular OS and commercial relational databases. APIs are available to implement new plugins to support specific systems. Pinpoint has been modeled after Google Dapper and promises to install agents without changing a single line of code and mininal impact (about 3% increase in resource usage) on applications performance. Pinpoint is licensed under the Apache License, Version 2.0 . Architecture Pinpoint has three main components:  - The collector: it receives monitoring data from the profiled applications. It stores those information in HBase .  - The web UI: the f...