Hello,

I have built a livy docker container and trying to connect it to a YARN
cluster running outside of docker container host. When I am submitting a
spark job via livy inside the container (yarn-cluster mode) from Zeppelin
UI, I see the job is submitted (verified from RM UI and yarn container
logs) but job eventually fail with following errors (shown in zeppelin UI):

%livy.spark
sc.version

org.apache.zeppelin.livy.LivyException: Session 3 is finished, appId:
application_1508274923092_0022, log: [ at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624),
at java.lang.Thread.run(Thread.java:748), , Shell output: main : command
provided 1, main : user is user1, main : requested yarn user is user1, , ,
Container exited with a non-zero exit code 15, Failing this attempt.
Failing the application.]
at
org.apache.zeppelin.livy.BaseLivyInterpreter.createSession(BaseLivyInterpreter.java:230)
at
org.apache.zeppelin.livy.BaseLivyInterpreter.initLivySession(BaseLivyInterpreter.java:119)
at
org.apache.zeppelin.livy.BaseLivyInterpreter.open(BaseLivyInterpreter.java:101)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

>From the yarn application logs, I found the following related to the job
submitted:

2017-10-19 12:44:59,423 INFO [main] yarn.ApplicationMaster: Waiting for
spark context initialization... 2017-10-19 12:44:59,476 INFO [Driver]
driver.RSCDriver: Connecting to: 57539c730d5f:34680 2017-10-19 12:44:59,476
INFO [Driver] driver.RSCDriver: Starting RPC server... 2017-10-19
12:44:59,688 WARN [Driver] rsc.RSCConf: Your hostname, node1.lab (*valid
hostname*), resolves to a loopback address, but we couldn't find any
external IP address! 2017-10-19 12:44:59,688 WARN [Driver] rsc.RSCConf: Set
livy.rsc.rpc.server.address if you need to bind to another address.
2017-10-19 12:44:59,704 ERROR [Driver] yarn.ApplicationMaster: User class
threw exception: java.util.concurrent.ExecutionException:
java.nio.channels.UnresolvedAddressException
java.util.concurrent.ExecutionException:
java.nio.channels.UnresolvedAddressException at
io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) at
com.cloudera.livy.rsc.driver.RSCDriver.initializeServer(RSCDriver.java:197)
at com.cloudera.livy.rsc.driver.RSCDriver.run(RSCDriver.java:326) at
com.cloudera.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: java.nio.channels.UnresolvedAddressException at
sun.nio.ch.Net.checkAddress(Net.java:101) at
sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622) at
io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:205)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1226)
at
io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:540)
at
io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:525)
at
io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:507)
at
io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:970)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:215) at
io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166) at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at java.lang.Thread.run(Thread.java:748)

As from above, it seems there is issue with connecting back to container or
resolving address, but, the dns resolution seems working from the cluster
node and inside docker container. Any pointers on what else I can check and
what else could be the issue?

Note: if the spark job is submitted via spark-submit from the same docker
container in Yarn-Cluster mode, it completes without any issue.

Hence, the issue is only isolated to livy or any configuration I may be
missing here? Let me know if there is any other information helpful in
debugging the issue.

- Sarjeet Singh

Reply via email to