spark multi-node cluster

codeoedoc Sun, 28 Sep 2014 00:37:08 -0700

Hi guys,

This is a spark fresh user...


I'm trying to setup a spark cluster with multiple nodes, starting with 2.
With one node, it is working fine. When I get a slave node, slave is able
to register to the master node. However when I launch a spark shell, and
when the executor is launched on the slave, I see below error on the slave
node, in $spark/work directory.

So the first exception is hadoop related. I have set $HADOOP_HOME to
/usr/local/hadoop, where hadoop is installed. It seems the first issue is
not the issue that is causing spark not to work. The second exception is
the problem.

Any idea why the problem happens and how can I resolve it?

Much appreciated.
cody

14/09/28 00:28:23 INFO CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
14/09/28 00:28:24 DEBUG MutableMetricsFactory: field
org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
with annotation
@org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=,
value=[Rate of successful kerberos logins and latency (milliseconds)],
always=false, type=DEFAULT, sampleName=Ops)
14/09/28 00:28:24 DEBUG MutableMetricsFactory: field
org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure
with annotation
@org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=,
value=[Rate of failed kerberos logins and latency (milliseconds)],
always=false, type=DEFAULT, sampleName=Ops)
14/09/28 00:28:24 DEBUG MutableMetricsFactory: field
org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with
annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time,
about=, value=[GetGroups], always=false, type=DEFAULT, sampleName=Ops)
14/09/28 00:28:24 DEBUG MetricsSystemImpl: UgiMetrics, User and group
related metrics
14/09/28 00:28:24 DEBUG KerberosName: Kerberos krb5 configuration not
found, setting default realm to empty
14/09/28 00:28:24 DEBUG Groups:  Creating new Groups object
14/09/28 00:28:24 DEBUG NativeCodeLoader: Trying to load the custom-built
native-hadoop library...
14/09/28 00:28:24 DEBUG NativeCodeLoader: Failed to load native-hadoop with
error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
14/09/28 00:28:24 DEBUG NativeCodeLoader:
java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
14/09/28 00:28:24 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/09/28 00:28:24 DEBUG JniBasedUnixGroupsMappingWithFallback: Falling back
to shell based
14/09/28 00:28:24 DEBUG JniBasedUnixGroupsMappingWithFallback: Group
mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
14/09/28 00:28:24 DEBUG Shell: Failed to detect a valid hadoop home
directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
        at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:265)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:290)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
        at
org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:92)
        at org.apache.hadoop.security.Groups.<init>(Groups.java:76)
        at
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:239)
        at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
        at
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
        at
org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:36)
        at
org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:109)
        at
org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
        at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
        at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
        at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
14/09/28 00:28:24 DEBUG Shell: setsid exited with exit code 0
14/09/28 00:28:24 DEBUG Groups: Group mapping
impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback;
cacheTimeout=300000; warningDeltaMs=5000
14/09/28 00:28:24 DEBUG SparkHadoopUtil: running as user: codeoedoc
14/09/28 00:28:24 DEBUG UserGroupInformation: hadoop login
14/09/28 00:28:24 DEBUG UserGroupInformation: hadoop login commit
14/09/28 00:28:24 DEBUG UserGroupInformation: using local
user:UnixPrincipal: codeoedoc
14/09/28 00:28:24 DEBUG UserGroupInformation: UGI loginUser:codeoedoc
(auth:SIMPLE)
14/09/28 00:28:24 DEBUG UserGroupInformation: PrivilegedAction as:codeoedoc
(auth:SIMPLE)
from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
14/09/28 00:28:24 INFO SecurityManager: Changing view acls to: codeoedoc
14/09/28 00:28:24 INFO SecurityManager: Changing modify acls to: codeoedoc
14/09/28 00:28:24 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(codeoedoc);
users with modify permissions: Set(codeoedoc)
14/09/28 00:28:24 DEBUG AkkaUtils: In createActorSystem, requireCookie is:
off
14/09/28 00:28:24 INFO Slf4jLogger: Slf4jLogger started
14/09/28 00:28:24 INFO Remoting: Starting remoting
14/09/28 00:28:24 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://driverPropsFetcher@localhost:40146]
14/09/28 00:28:24 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://driverPropsFetcher@localhost:40146]
14/09/28 00:28:24 INFO Utils: Successfully started service
'driverPropsFetcher' on port 40146.



















*14/09/28 00:28:54 WARN UserGroupInformation: PriviledgedActionException
as:codeoedoc (auth:SIMPLE) cause:java.util.concurrent.TimeoutException:
Futures timed out after [30 seconds]Exception in thread "main"
java.lang.reflect.UndeclaredThrowableException        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)Caused
by: java.util.concurrent.TimeoutException: Futures timed out after [30
seconds]        at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)        at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
at java.security.AccessController.doPrivileged(Native Method)        at
javax.security.auth.Subject.doAs(Subject.java:415)        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
... 4 more*

spark multi-node cluster

Reply via email to