Interesting. Perhaps you could try resolving service addresses from within a pod and seeing if there's some other issue causing intermittent failures in resolution. The steps here <https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/#getting-a-shell-to-a-container> may be helpful.
On Tue, May 29, 2018 at 4:02 PM, purna pradeep <purna2prad...@gmail.com> wrote: > Abirudh, > > Thanks for your response > > I’m running k8s cluster on AWS and kub-dns pods are running fine and also > as I mentioned only 1 executor pod is running though I requested for 5 and > rest 4 were killed with below error and I do have enough resources > available. > > On Tue, May 29, 2018 at 6:28 PM Anirudh Ramanathan <anir...@foxish.me> > wrote: > >> This looks to me like a kube-dns error that's causing the driver DNS >> address to not resolve. >> It would be worth double checking that kube-dns is indeed running (in the >> kube-system namespace). >> Often, with environments like minikube, kube-dns may exit/crashloop due >> to lack of resource. >> >> On Tue, May 29, 2018 at 3:18 PM, purna pradeep <purna2prad...@gmail.com> >> wrote: >> >>> Hello, >>> >>> I’m getting below error when I spark-submit a Spark 2.3 app on >>> Kubernetes *v1.8.3* , some of the executor pods were killed with below >>> error as soon as they come up >>> >>> Exception in thread "main" java.lang.reflect. >>> UndeclaredThrowableException >>> >>> at org.apache.hadoop.security.UserGroupInformation.doAs( >>> UserGroupInformation.java:1713) >>> >>> at org.apache.spark.deploy.SparkHadoopUtil. >>> runAsSparkUser(SparkHadoopUtil.scala:64) >>> >>> at org.apache.spark.executor. >>> CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend. >>> scala:188) >>> >>> at org.apache.spark.executor. >>> CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend. >>> scala:293) >>> >>> at org.apache.spark.executor. >>> CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) >>> >>> Caused by: org.apache.spark.SparkException: Exception thrown in >>> awaitResult: >>> >>> at org.apache.spark.util.ThreadUtils$.awaitResult( >>> ThreadUtils.scala:205) >>> >>> at org.apache.spark.rpc.RpcTimeout.awaitResult( >>> RpcTimeout.scala:75) >>> >>> at org.apache.spark.rpc.RpcEnv. >>> setupEndpointRefByURI(RpcEnv.scala:101) >>> >>> at org.apache.spark.executor. >>> CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp( >>> CoarseGrainedExecutorBackend.scala:201) >>> >>> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run( >>> SparkHadoopUtil.scala:65) >>> >>> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run( >>> SparkHadoopUtil.scala:64) >>> >>> at java.security.AccessController.doPrivileged(Native >>> Method) >>> >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> >>> at org.apache.hadoop.security.UserGroupInformation.doAs( >>> UserGroupInformation.java:1698) >>> >>> ... 4 more >>> >>> Caused by: java.io.IOException: Failed to connect to >>> spark-1527629824987-driver-svc.spark.svc:7078 >>> >>> at org.apache.spark.network. >>> client.TransportClientFactory.createClient(TransportClientFactory.java: >>> 245) >>> >>> at org.apache.spark.network. >>> client.TransportClientFactory.createClient(TransportClientFactory.java: >>> 187) >>> >>> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient( >>> NettyRpcEnv.scala:198) >>> >>> at org.apache.spark.rpc.netty. >>> Outbox$$anon$1.call(Outbox.scala:194) >>> >>> at org.apache.spark.rpc.netty. >>> Outbox$$anon$1.call(Outbox.scala:190) >>> >>> at java.util.concurrent.FutureTask.run(FutureTask. >>> java:266) >>> >>> at java.util.concurrent.ThreadPoolExecutor.runWorker( >>> ThreadPoolExecutor.java:1149) >>> >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run( >>> ThreadPoolExecutor.java:624) >>> >>> at java.lang.Thread.run(Thread.java:748) >>> >>> Caused by: java.net.UnknownHostException: spark-1527629824987-driver- >>> svc.spark.svc >>> >>> at java.net.InetAddress.getAllByName0(InetAddress. >>> java:1280) >>> >>> at java.net.InetAddress.getAllByName(InetAddress.java: >>> 1192) >>> >>> at java.net.InetAddress.getAllByName(InetAddress.java: >>> 1126) >>> >>> at java.net.InetAddress.getByName(InetAddress.java:1076) >>> >>> at io.netty.util.internal.SocketUtils$8.run(SocketUtils. >>> java:146) >>> >>> at io.netty.util.internal.SocketUtils$8.run(SocketUtils. >>> java:143) >>> >>> at java.security.AccessController.doPrivileged(Native >>> Method) >>> >>> at io.netty.util.internal.SocketUtils.addressByName( >>> SocketUtils.java:143) >>> >>> at io.netty.resolver.DefaultNameResolver.doResolve( >>> DefaultNameResolver.java:43) >>> >>> at io.netty.resolver.SimpleNameResolver.resolve( >>> SimpleNameResolver.java:63) >>> >>> at io.netty.resolver.SimpleNameResolver.resolve( >>> SimpleNameResolver.java:55) >>> >>> at io.netty.resolver.InetSocketAddressResolver. >>> doResolve(InetSocketAddressResolver.java:57) >>> >>> at io.netty.resolver.InetSocketAddressResolver. >>> doResolve(InetSocketAddressResolver.java:32) >>> >>> at io.netty.resolver.AbstractAddressResolver.resolve( >>> AbstractAddressResolver.java:108) >>> >>> at io.netty.bootstrap.Bootstrap.doResolveAndConnect0( >>> Bootstrap.java:208) >>> >>> at io.netty.bootstrap.Bootstrap. >>> access$000(Bootstrap.java:49) >>> >>> at io.netty.bootstrap.Bootstrap$ >>> 1.operationComplete(Bootstrap.java:188) >>> >>> at io.netty.bootstrap.Bootstrap$ >>> 1.operationComplete(Bootstrap.java:174) >>> >>> at io.netty.util.concurrent.DefaultPromise. >>> notifyListener0(DefaultPromise.java:507) >>> >>> at io.netty.util.concurrent.DefaultPromise. >>> notifyListenersNow(DefaultPromise.java:481) >>> >>> at io.netty.util.concurrent.DefaultPromise. >>> notifyListeners(DefaultPromise.java:420) >>> >>> at io.netty.util.concurrent.DefaultPromise.trySuccess( >>> DefaultPromise.java:104) >>> >>> at io.netty.channel.DefaultChannelPromise.trySuccess( >>> DefaultChannelPromise.java:82) >>> >>> at io.netty.channel.AbstractChannel$ >>> AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978) >>> >>> at io.netty.channel.AbstractChannel$ >>> AbstractUnsafe.register0(AbstractChannel.java:512) >>> >>> at io.netty.channel.AbstractChannel$ >>> AbstractUnsafe.access$200(AbstractChannel.java:423) >>> >>> at io.netty.channel.AbstractChannel$ >>> AbstractUnsafe$1.run(AbstractChannel.java:482) >>> >>> at io.netty.util.concurrent.AbstractEventExecutor. >>> safeExecute(AbstractEventExecutor.java:163) >>> >>> at io.netty.util.concurrent.SingleThreadEventExecutor. >>> runAllTasks(SingleThreadEventExecutor.java:403) >>> >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop. >>> java:463) >>> >>> at io.netty.util.concurrent.SingleThreadEventExecutor$5. >>> run(SingleThreadEventExecutor.java:858) >>> >>> at io.netty.util.concurrent.DefaultThreadFactory$ >>> DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) >>> >>> ... 1 more >>> >> >>