Thanks Yinan, I’m able to get kube-dns endpoints when I ran this command
kubectl get ep kube-dns —namespace=kube-system Do I need to deploy under kube-system instead of default namespace And please lemme know if you have any insights on Error1 ? On Sun, Mar 11, 2018 at 8:26 PM Yinan Li <liyinan...@gmail.com> wrote: > Spark on Kubernetes requires the presence of the kube-dns add-on properly > configured. The executors connect to the driver through a headless > Kubernetes service using the DNS name of the service. Can you check if you > have the add-on installed in your cluster? This issue > https://github.com/apache-spark-on-k8s/spark/issues/558 might help. > > > On Sun, Mar 11, 2018 at 5:01 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > >> Getting below errors when I’m trying to run spark-submit on k8 cluster >> >> >> *Error 1*:This looks like a warning it doesn’t interrupt the app running >> inside executor pod but keeps on getting this warning >> >> >> *2018-03-09 11:15:21 WARN WatchConnectionManager:192 - Exec Failure* >> * java.io.EOFException* >> * at >> okio.RealBufferedSource.require(RealBufferedSource.java:60)* >> * at >> okio.RealBufferedSource.readByte(RealBufferedSource.java:73)* >> * at okhttp3.internal.ws >> <http://okhttp3.internal.ws>.WebSocketReader.readHeader(WebSocketReader.java:113)* >> * at okhttp3.internal.ws >> <http://okhttp3.internal.ws>.WebSocketReader.processNextFrame(WebSocketReader.java:97)* >> * at okhttp3.internal.ws >> <http://okhttp3.internal.ws>.RealWebSocket.loopReader(RealWebSocket.java:262)* >> * at okhttp3.internal.ws >> <http://okhttp3.internal.ws>.RealWebSocket$2.onResponse(RealWebSocket.java:201)* >> * at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)* >> * at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)* >> * at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)* >> * at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)* >> * at java.lang.Thread.run(Thread.java:748)* >> >> >> >> *Error2:* This is intermittent error which is failing the executor pod >> to run >> >> >> *org.apache.spark.SparkException: External scheduler cannot be >> instantiated* >> * at >> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)* >> * at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)* >> * at >> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)* >> * at >> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)* >> * at >> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)* >> * at scala.Option.getOrElse(Option.scala:121)* >> * at >> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)* >> * at >> com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)* >> * at >> com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)* >> * at >> com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)* >> * at >> com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)* >> * at >> com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)* >> * Caused by: io.fabric8.kubernetes.client.KubernetesClientException: >> Operation: [get] for kind: [Pod] with name: >> [myapp-ef79db3d9f4831bf85bda14145fdf113-driver-driver] in namespace: >> [default] failed.* >> * at >> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)* >> * at >> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)* >> * at >> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)* >> * at >> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)* >> * at >> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)* >> * at >> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)* >> * at >> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)* >> * ... 11 more* >> * Caused by: java.net.UnknownHostException: kubernetes.default.svc: >> Try again* >> * at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)* >> * at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)* >> * at >> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)* >> * at java.net.InetAddress.getAllByName0(InetAddress.java:1276)* >> * at java.net.InetAddress.getAllByName(InetAddress.java:1192)* >> * at java.net.InetAddress.getAllByName(InetAddress.java:1126)* >> * at okhttp3.Dns$1.lookup(Dns.java:39)* >> * at >> okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)* >> * at >> okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)* >> * at >> okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)* >> * at >> okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)* >> * at >> okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)* >> * at >> okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)* >> * at >> okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)* >> * at >> okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)* >> * at >> okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)* >> * at >> io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)* >> * at >> okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)* >> * at okhttp3.RealCall.execute(RealCall.java:69)* >> * at >> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)* >> * at >> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)* >> * at >> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)* >> * at >> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)* >> * at >> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)* >> * at >> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)* >> * ... 15 more* >> * 2018-03-09 15:00:39 INFO AbstractConnector:318 - Stopped >> Spark@5f59185e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040 <http://0.0.0.0:4040/>}* >> * 2018-03-09 15:00:39 INFO SparkUI:54 - Stopped Spark web UI >> at http://myapp-ef79db3d9f4831bf85bda14145fdf113-driver-svc.default.svc:4040 >> <http://myapp-ef79db3d9f4831bf85bda14145fdf113-driver-svc.default.svc:4040/>* >> * 2018-03-09 15:00:39 INFO MapOutputTrackerMasterEndpoint:54 - >> MapOutputTrackerMasterEndpoint stopped!* >> * 2018-03-09 15:00:39 INFO MemoryStore:54 - MemoryStore cleared* >> * 2018-03-09 15:00:39 INFO BlockManager:54 - BlockManager stopped* >> * 2018-03-09 15:00:39 INFO BlockManagerMaster:54 - BlockManagerMaster >> stopped* >> * 2018-03-09 15:00:39 WARN MetricsSystem:66 - Stopping a >> MetricsSystem that is not running* >> * 2018-03-09 15:00:39 INFO >> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - >> OutputCommitCoordinator stopped!* >> * 2018-03-09 15:00:39 INFO SparkContext:54 - Successfully stopped >> SparkContext* >> * Exception in thread "main" org.apache.spark.SparkException: External >> scheduler cannot be instantiated* >> * at >> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)* >> * at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)* >> * at >> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)* >> * at >> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)* >> * at >> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)* >> * at scala.Option.getOrElse(Option.scala:121)* >> * at >> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)* >> * at >> com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)* >> * at >> com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)* >> * at >> com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)* >> * at >> com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)* >> * at >> com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)* >> * Caused by: io.fabric8.kubernetes.client.KubernetesClientException: >> Operation: [get] for kind: [Pod] with name: >> [myapp-ef79db3d9f4831bf85bda14145fdf113-driver] in namespace: [default] >> failed.* >> * at >> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)* >> * at >> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)* >> * at >> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)* >> * at >> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)* >> * at >> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)* >> * at >> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)* >> * at >> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)* >> * ... 11 more* >> * Caused by: java.net.UnknownHostException: kubernetes.default.svc: >> Try again* >> * at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)* >> * at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)* >> * at >> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)* >> * at java.net.InetAddress.getAllByName0(InetAddress.java:1276)* >> * at java.net.InetAddress.getAllByName(InetAddress.java:1192)* >> * at java.net.InetAddress.getAllByName(InetAddress.java:1126)* >> * at okhttp3.Dns$1.lookup(Dns.java:39)* >> * at >> okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)* >> * at >> okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)* >> * at >> okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)* >> * at >> okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)* >> * at >> okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)* >> * at >> okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)* >> * at >> okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)* >> * at >> okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)* >> * at >> okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)* >> * at >> io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)* >> * at >> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)* >> * at >> okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)* >> * at okhttp3.RealCall.execute(RealCall.java:69)* >> * at >> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)* >> * at >> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)* >> * at >> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)* >> * at >> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)* >> * at >> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)* >> * at >> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)* >> * ... 15 more* >> * 2018-03-09 15:00:39 INFO ShutdownHookManager:54 - Shutdown hook >> called* >> * 2018-03-09 15:00:39 INFO ShutdownHookManager:54 - Deleting >> directory /tmp/spark-5bd85c96-d689-4c53-a0b3-1eadd32357cb* >> >> >> Note:Able to run the application successfully but spark-submit run fails >> with above error2 very frequently. >> >> > >