Vajiha filed a spark-rapids discussion here https://github.com/NVIDIA/spark-rapids/discussions/7205, so if you are interested please follow there.
On Wed, Nov 30, 2022 at 7:17 AM Vajiha Begum S A < vajihabegu...@maestrowiz.com> wrote: > Hi, > I'm using an Ubuntu system with the NVIDIA Quadro K1200 with GPU memory > 20GB > Installed - CUDF 22.10.0 jar file, Rapid 4 Spark 2.12-22.10.0 jar file, > CUDA Toolkit 11.8.0 Linux version., JAVA 8 > I'm running only single server, Master is localhost > > I'm trying to run pyspark code through spark submit & Python idle. I'm > getting errors. Kindly help me to resolve this error. > Kindly give suggestions where I have made mistakes. > > *Error when running code through spark-submit:* > spark-submit /home/mwadmin/Documents/test.py > 22/11/30 14:59:32 WARN Utils: Your hostname, mwadmin-HP-Z440-Workstation > resolves to a loopback address: 127.0.1.1; using ***.***.**.** instead (on > interface eno1) > 22/11/30 14:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another address > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 22/11/30 14:59:32 INFO SparkContext: Running Spark version 3.2.2 > 22/11/30 14:59:32 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 22/11/30 14:59:33 INFO ResourceUtils: > ============================================================== > 22/11/30 14:59:33 INFO ResourceUtils: No custom resources configured for > spark.driver. > 22/11/30 14:59:33 INFO ResourceUtils: > ============================================================== > 22/11/30 14:59:33 INFO SparkContext: Submitted application: Spark.com > 22/11/30 14:59:33 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: > , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> > name: offHeap, amount: 0, script: , vendor: , gpu -> name: gpu, amount: 1, > script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0, > gpu -> name: gpu, amount: 0.5) > 22/11/30 14:59:33 INFO ResourceProfile: Limiting resource is cpus at 1 > tasks per executor > 22/11/30 14:59:33 WARN ResourceUtils: The configuration of resource: gpu > (exec = 1, task = 0.5/2, runnable tasks = 2) will result in wasted > resources due to resource cpus limiting the number of runnable tasks per > executor to: 1. Please adjust your configuration. > 22/11/30 14:59:33 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 22/11/30 14:59:33 INFO SecurityManager: Changing view acls to: mwadmin > 22/11/30 14:59:33 INFO SecurityManager: Changing modify acls to: mwadmin > 22/11/30 14:59:33 INFO SecurityManager: Changing view acls groups to: > 22/11/30 14:59:33 INFO SecurityManager: Changing modify acls groups to: > 22/11/30 14:59:33 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(mwadmin); > groups with view permissions: Set(); users with modify permissions: > Set(mwadmin); groups with modify permissions: Set() > 22/11/30 14:59:33 INFO Utils: Successfully started service 'sparkDriver' > on port 45883. > 22/11/30 14:59:33 INFO SparkEnv: Registering MapOutputTracker > 22/11/30 14:59:33 INFO SparkEnv: Registering BlockManagerMaster > 22/11/30 14:59:33 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 22/11/30 14:59:33 INFO BlockManagerMasterEndpoint: > BlockManagerMasterEndpoint up > 22/11/30 14:59:33 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 22/11/30 14:59:33 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-647d2c2a-72e4-402d-aeff-d7460726eb6d > 22/11/30 14:59:33 INFO MemoryStore: MemoryStore started with capacity > 366.3 MiB > 22/11/30 14:59:33 INFO SparkEnv: Registering OutputCommitCoordinator > 22/11/30 14:59:33 INFO Utils: Successfully started service 'SparkUI' on > port 4040. > 22/11/30 14:59:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > htttp://localhost:4040 > 22/11/30 14:59:33 INFO ShimLoader: Loading shim for Spark version: 3.2.2 > 22/11/30 14:59:33 INFO ShimLoader: Complete Spark build info: 3.2.2, > https://github.com/apache/spark, HEAD, > 78a5825fe266c0884d2dd18cbca9625fa258d7f7, 2022-07-11T15:44:21Z > 22/11/30 14:59:33 INFO ShimLoader: findURLClassLoader found a > URLClassLoader org.apache.spark.util.MutableURLClassLoader@1530c739 > 22/11/30 14:59:33 INFO ShimLoader: Updating spark classloader > org.apache.spark.util.MutableURLClassLoader@1530c739 with the URLs: > jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark3xx-common/, > jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark322/ > 22/11/30 14:59:33 INFO ShimLoader: Spark classLoader > org.apache.spark.util.MutableURLClassLoader@1530c739 updated successfully > 22/11/30 14:59:33 INFO ShimLoader: Updating spark classloader > org.apache.spark.util.MutableURLClassLoader@1530c739 with the URLs: > jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark3xx-common/, > jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark322/ > 22/11/30 14:59:33 INFO ShimLoader: Spark classLoader > org.apache.spark.util.MutableURLClassLoader@1530c739 updated successfully > 22/11/30 14:59:33 INFO RapidsPluginUtils: RAPIDS Accelerator build: > {version=22.10.0, user=, url=https://github.com/NVIDIA/spark-rapids.git, > date=2022-10-17T11:25:41Z, > revision=c75a2eafc9ce9fb3e6ab75c6677d97bf681bff50, cudf_version=22.10.0, > branch=HEAD} > 22/11/30 14:59:33 INFO RapidsPluginUtils: RAPIDS Accelerator JNI build: > {version=22.10.0, user=, url= > https://github.com/NVIDIA/spark-rapids-jni.git, > date=2022-10-14T05:19:41Z, > revision=b2c02b61afe1747f3741d6c5e2064edb8da51b32, branch=HEAD} > 22/11/30 14:59:33 INFO RapidsPluginUtils: cudf build: {version=22.10.0, > user=, date=2022-10-14T01:51:22Z, > revision=8ffe375d85f8fd0f98e0052f36ccd820a669d0ab, branch=HEAD} > 22/11/30 14:59:33 WARN RapidsPluginUtils: RAPIDS Accelerator 22.10.0 using > cudf 22.10.0. > 22/11/30 14:59:33 WARN RapidsPluginUtils: > spark.rapids.sql.multiThreadedRead.numThreads is set to 20. > 22/11/30 14:59:33 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, > to disable GPU support set `spark.rapids.sql.enabled` to false. > 22/11/30 14:59:33 WARN RapidsPluginUtils: spark.rapids.sql.explain is set > to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about > the query placement on the GPU. > 22/11/30 14:59:33 INFO DriverPluginContainer: Initialized driver component > for plugin com.nvidia.spark.SQLPlugin. > 22/11/30 14:59:33 WARN ResourceUtils: The configuration of resource: gpu > (exec = 1, task = 0.5/2, runnable tasks = 2) will result in wasted > resources due to resource cpus limiting the number of runnable tasks per > executor to: 1. Please adjust your configuration. > 22/11/30 14:59:34 INFO Executor: Starting executor ID driver on host > ***.***.**.** > 22/11/30 14:59:34 INFO RapidsExecutorPlugin: RAPIDS Accelerator build: > {version=22.10.0, user=, url=https://github.com/NVIDIA/spark-rapids.git, > date=2022-10-17T11:25:41Z, > revision=c75a2eafc9ce9fb3e6ab75c6677d97bf681bff50, cudf_version=22.10.0, > branch=HEAD} > 22/11/30 14:59:34 INFO RapidsExecutorPlugin: cudf build: {version=22.10.0, > user=, date=2022-10-14T01:51:22Z, > revision=8ffe375d85f8fd0f98e0052f36ccd820a669d0ab, branch=HEAD} > 22/11/30 14:59:34 INFO RapidsExecutorPlugin: Initializing memory from > Executor Plugin > 22/11/30 14:59:47 INFO Executor: Told to re-register on heartbeat > 22/11/30 14:59:47 INFO BlockManager: BlockManager null re-registering with > master > 22/11/30 14:59:48 INFO BlockManagerMaster: Registering BlockManager null > 22/11/30 14:59:48 ERROR Inbox: Ignoring error > java.lang.NullPointerException > at org.apache.spark.storage.BlockManagerMasterEndpoint.org > $apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:534) > at > org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:117) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at org.apache.spark.rpc.netty.MessageLoop.org > $apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > 22/11/30 14:59:48 WARN Executor: Issue communicating with driver in > heartbeater > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87) > at > org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:78) > at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:626) > at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1009) > at > org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:212) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2048) > at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.NullPointerException > at org.apache.spark.storage.BlockManagerMasterEndpoint.org > $apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:534) > at > org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:117) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at org.apache.spark.rpc.netty.MessageLoop.org > $apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > 22/11/30 14:59:52 INFO GpuDeviceManager: Initializing RMM ASYNC pool size > = 3137.0625 MB on gpuId 0 > 22/11/30 14:59:52 INFO GpuDeviceManager: Using per-thread default stream > 22/11/30 14:59:52 ERROR RapidsExecutorPlugin: Exception in the executor > plugin, shutting down! > *ai.rapids.cudf.CudfException: RMM failure at: > /home/jenkins/agent/workspace/jenkins-cudf-release-39-cuda11/cpp/build/_deps/rmm-src/include/rmm/mr/device/cuda_async_memory_resource.hpp:90: > cudaMallocAsync not supported with this CUDA driver/runtime version* > at ai.rapids.cudf.Rmm.initializeInternal(Native Method) > at ai.rapids.cudf.Rmm.initialize(Rmm.java:119) > at > com.nvidia.spark.rapids.GpuDeviceManager$.initializeRmm(GpuDeviceManager.scala:296) > at > com.nvidia.spark.rapids.GpuDeviceManager$.initializeMemory(GpuDeviceManager.scala:328) > at > com.nvidia.spark.rapids.GpuDeviceManager$.initializeGpuAndMemory(GpuDeviceManager.scala:137) > at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:258) > at > org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$executorPlugins$1(PluginContainer.scala:125) > at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293) > at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) > at > org.apache.spark.internal.plugin.ExecutorPluginContainer.<init>(PluginContainer.scala:113) > at > org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:211) > at > org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:199) > at > org.apache.spark.executor.Executor.$anonfun$plugins$1(Executor.scala:253) > at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231) > at org.apache.spark.executor.Executor.<init>(Executor.scala:253) > at > org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64) > at > org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:581) > at > org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:238) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) > at > py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) > at py4j.ClientServerConnection.run(ClientServerConnection.java:106) > at java.lang.Thread.run(Thread.java:750) > 22/11/30 14:59:52 INFO DiskBlockManager: Shutdown hook called > 22/11/30 14:59:52 INFO ShutdownHookManager: Shutdown hook called > 22/11/30 14:59:52 INFO ShutdownHookManager: Deleting directory > /tmp/spark-58488513-7d53-42f2-8bc4-cdcb34b5cf49 > 22/11/30 14:59:52 INFO ShutdownHookManager: Deleting directory > /tmp/spark-24b8e0ea-43d4-430a-9756-b1e84ceaa1ff/userFiles-5ce7f28f-16db-48fd-94bd-e9ef563c01f1 > 22/11/30 14:59:52 INFO ShutdownHookManager: Deleting directory > /tmp/spark-24b8e0ea-43d4-430a-9756-b1e84ceaa1ff > > > > *Error When running code through Python IDLE: * > raise Py4JNetworkError("Answer from Java side is empty") > py4j.protocol.Py4JNetworkError: Answer from Java side is empty During > handling of the above exception, another exception occurred: > > >