Interesting.
Perhaps you could try resolving service addresses from within a pod and
seeing if there's some other issue causing intermittent failures in
resolution.
The steps here
<https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/#getting-a-shell-to-a-container>
may
be helpful.

On Tue, May 29, 2018 at 4:02 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Abirudh,
>
> Thanks for your response
>
> I’m running k8s cluster on AWS and kub-dns pods are running fine and also
> as I mentioned only 1 executor pod is running though I requested for 5 and
> rest 4 were killed with below error and I do have enough resources
> available.
>
> On Tue, May 29, 2018 at 6:28 PM Anirudh Ramanathan <anir...@foxish.me>
> wrote:
>
>> This looks to me like a kube-dns error that's causing the driver DNS
>> address to not resolve.
>> It would be worth double checking that kube-dns is indeed running (in the
>> kube-system namespace).
>> Often, with environments like minikube, kube-dns may exit/crashloop due
>> to lack of resource.
>>
>> On Tue, May 29, 2018 at 3:18 PM, purna pradeep <purna2prad...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I’m getting below  error when I spark-submit a Spark 2.3 app on
>>> Kubernetes *v1.8.3* , some of the executor pods  were killed with below
>>> error as soon as they come up
>>>
>>> Exception in thread "main" java.lang.reflect.
>>> UndeclaredThrowableException
>>>
>>>                 at org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1713)
>>>
>>>                 at org.apache.spark.deploy.SparkHadoopUtil.
>>> runAsSparkUser(SparkHadoopUtil.scala:64)
>>>
>>>                 at org.apache.spark.executor.
>>> CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.
>>> scala:188)
>>>
>>>                 at org.apache.spark.executor.
>>> CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.
>>> scala:293)
>>>
>>>                 at org.apache.spark.executor.
>>> CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>>>
>>> Caused by: org.apache.spark.SparkException: Exception thrown in
>>> awaitResult:
>>>
>>>                 at org.apache.spark.util.ThreadUtils$.awaitResult(
>>> ThreadUtils.scala:205)
>>>
>>>                 at org.apache.spark.rpc.RpcTimeout.awaitResult(
>>> RpcTimeout.scala:75)
>>>
>>>                 at org.apache.spark.rpc.RpcEnv.
>>> setupEndpointRefByURI(RpcEnv.scala:101)
>>>
>>>                 at org.apache.spark.executor.
>>> CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(
>>> CoarseGrainedExecutorBackend.scala:201)
>>>
>>>                 at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(
>>> SparkHadoopUtil.scala:65)
>>>
>>>                 at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(
>>> SparkHadoopUtil.scala:64)
>>>
>>>                 at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>
>>>                 at javax.security.auth.Subject.doAs(Subject.java:422)
>>>
>>>                 at org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1698)
>>>
>>>                 ... 4 more
>>>
>>> Caused by: java.io.IOException: Failed to connect to
>>> spark-1527629824987-driver-svc.spark.svc:7078
>>>
>>>                 at org.apache.spark.network.
>>> client.TransportClientFactory.createClient(TransportClientFactory.java:
>>> 245)
>>>
>>>                 at org.apache.spark.network.
>>> client.TransportClientFactory.createClient(TransportClientFactory.java:
>>> 187)
>>>
>>>                 at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(
>>> NettyRpcEnv.scala:198)
>>>
>>>                 at org.apache.spark.rpc.netty.
>>> Outbox$$anon$1.call(Outbox.scala:194)
>>>
>>>                 at org.apache.spark.rpc.netty.
>>> Outbox$$anon$1.call(Outbox.scala:190)
>>>
>>>                 at java.util.concurrent.FutureTask.run(FutureTask.
>>> java:266)
>>>
>>>                 at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> ThreadPoolExecutor.java:1149)
>>>
>>>                 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> ThreadPoolExecutor.java:624)
>>>
>>>                 at java.lang.Thread.run(Thread.java:748)
>>>
>>> Caused by: java.net.UnknownHostException: spark-1527629824987-driver-
>>> svc.spark.svc
>>>
>>>                 at java.net.InetAddress.getAllByName0(InetAddress.
>>> java:1280)
>>>
>>>                 at java.net.InetAddress.getAllByName(InetAddress.java:
>>> 1192)
>>>
>>>                 at java.net.InetAddress.getAllByName(InetAddress.java:
>>> 1126)
>>>
>>>                 at java.net.InetAddress.getByName(InetAddress.java:1076)
>>>
>>>                 at io.netty.util.internal.SocketUtils$8.run(SocketUtils.
>>> java:146)
>>>
>>>                 at io.netty.util.internal.SocketUtils$8.run(SocketUtils.
>>> java:143)
>>>
>>>                 at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>
>>>                 at io.netty.util.internal.SocketUtils.addressByName(
>>> SocketUtils.java:143)
>>>
>>>                 at io.netty.resolver.DefaultNameResolver.doResolve(
>>> DefaultNameResolver.java:43)
>>>
>>>                 at io.netty.resolver.SimpleNameResolver.resolve(
>>> SimpleNameResolver.java:63)
>>>
>>>                 at io.netty.resolver.SimpleNameResolver.resolve(
>>> SimpleNameResolver.java:55)
>>>
>>>                 at io.netty.resolver.InetSocketAddressResolver.
>>> doResolve(InetSocketAddressResolver.java:57)
>>>
>>>                 at io.netty.resolver.InetSocketAddressResolver.
>>> doResolve(InetSocketAddressResolver.java:32)
>>>
>>>                 at io.netty.resolver.AbstractAddressResolver.resolve(
>>> AbstractAddressResolver.java:108)
>>>
>>>                 at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(
>>> Bootstrap.java:208)
>>>
>>>                 at io.netty.bootstrap.Bootstrap.
>>> access$000(Bootstrap.java:49)
>>>
>>>                 at io.netty.bootstrap.Bootstrap$
>>> 1.operationComplete(Bootstrap.java:188)
>>>
>>>                 at io.netty.bootstrap.Bootstrap$
>>> 1.operationComplete(Bootstrap.java:174)
>>>
>>>                 at io.netty.util.concurrent.DefaultPromise.
>>> notifyListener0(DefaultPromise.java:507)
>>>
>>>                 at io.netty.util.concurrent.DefaultPromise.
>>> notifyListenersNow(DefaultPromise.java:481)
>>>
>>>                 at io.netty.util.concurrent.DefaultPromise.
>>> notifyListeners(DefaultPromise.java:420)
>>>
>>>                 at io.netty.util.concurrent.DefaultPromise.trySuccess(
>>> DefaultPromise.java:104)
>>>
>>>                 at io.netty.channel.DefaultChannelPromise.trySuccess(
>>> DefaultChannelPromise.java:82)
>>>
>>>                 at io.netty.channel.AbstractChannel$
>>> AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978)
>>>
>>>                 at io.netty.channel.AbstractChannel$
>>> AbstractUnsafe.register0(AbstractChannel.java:512)
>>>
>>>                 at io.netty.channel.AbstractChannel$
>>> AbstractUnsafe.access$200(AbstractChannel.java:423)
>>>
>>>                 at io.netty.channel.AbstractChannel$
>>> AbstractUnsafe$1.run(AbstractChannel.java:482)
>>>
>>>                 at io.netty.util.concurrent.AbstractEventExecutor.
>>> safeExecute(AbstractEventExecutor.java:163)
>>>
>>>                 at io.netty.util.concurrent.SingleThreadEventExecutor.
>>> runAllTasks(SingleThreadEventExecutor.java:403)
>>>
>>>                 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.
>>> java:463)
>>>
>>>                 at io.netty.util.concurrent.SingleThreadEventExecutor$5.
>>> run(SingleThreadEventExecutor.java:858)
>>>
>>>                 at io.netty.util.concurrent.DefaultThreadFactory$
>>> DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
>>>
>>>                 ... 1 more
>>>
>>
>>

Reply via email to