A tricky exception running MapReduce functions through RHadoop: root cause and how to fix it.

RHadoop (https://github.com/RevolutionAnalytics/RHadoop/wiki) is a collection of five R packages (rhdfs, rmr2, rhbase, ravro, plyrmr) that allow users to manage and analyze data with Hadoop. Running any MapReduce function, also this simple one

    from.dfs(mapreduce(to.dfs(1:100)))

through RHadoop on Linux servers you could face this exception:

2015-10-20 08:39:41,722 ERROR [main] org.apache.hadoop.streaming.PipeMapRed: configuration exception
java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1059)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
    at java.lang.reflect.Method.invoke(Method.java:620)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
    at java.lang.reflect.Method.invoke(Method.java:620)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(AccessController.java:452)
    at javax.security.auth.Subject.doAs(Subject.java:572)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:188)
    at java.lang.ProcessImpl.start(ProcessImpl.java:164)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1040)
    ... 24 more

At first glance one could think of a missing permission issue on the Rscript tool for the user running Hadoop (the file cannot be missing because it is part of the R environment and the MapReduce function has been triggered by a R console). But this issue happens also when the permissions for that user are r-x for the Rscript file. The root cause for this issue is the following: RHadoop/Hadoop tries to to execute the Rscript tool from the /usr/bin/ location when it is really in the $R_HOME/bin/ directory. The solution is simple. As root user create a symbolic link to the Rscript file this way:

ln -s $R_HOME/bin/Rscript /usr/bin/Rscript

This solution works on Red Hat and CentOS, but I suppose it should work on any Linux distro.

Googlielmo's blog

Search This Blog

A tricky exception running MapReduce functions through RHadoop: root cause and how to fix it.

Labels

Comments

Post a Comment

Popular posts from this blog

jOOQ: code generation in Eclipse

Turning Python Scripts into Working Web Apps Quickly with Streamlit

TagUI: an Excellent Open Source Option for RPA - Introduction