GPU job in Spark 3

Martin Somers Fri, 09 Apr 2021 09:18:49 -0700

Hi Everyone !!

Im trying to get on premise GPU instance of Spark 3 running on my ubuntu
box, and I am following:
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation


Anyone with any insight into why a spark job isnt being ran on the GPU -
appears to be all on the CPU, hadoop binary installed and appears to be
functioning fine

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

here is my setup on ubuntu20.10


▶ nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2
  |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
M. |
|                               |                      |               MIG
M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:21:00.0  On |
 N/A |
|  0%   38C    P8    19W / 370W |    478MiB / 24265MiB |      0%
 Default |
|                               |                      |
 N/A |
+-------------------------------+----------------------+----------------------+

/opt/sparkRapidsPlugin


▶ ls
cudf-0.18.1-cuda11.jar  getGpusResources.sh  rapids-4-spark_2.12-0.4.1.jar

▶ scalac --version
Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and
Lightbend, Inc.


▶ spark-shell --version
2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to
a loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface
wlp71s0)
2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to
bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10
Branch HEAD
Compiled by user ubuntu on 2021-02-22T01:04:02Z
Revision 1d550c4e90275ab418b9161925049239227f3dc9
Url https://github.com/apache/spark
Type --help for more information.


here is how I calling spark prior to adding the test job

$SPARK_HOME/bin/spark-shell \
       --master local \
       --num-executors 1 \
       --conf spark.executor.cores=16 \
       --conf spark.rapids.sql.concurrentGpuTasks=1 \
       --driver-memory 10g \
       --conf
spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}

       --conf spark.rapids.memory.pinnedPool.size=16G \
       --conf spark.locality.wait=0s \
       --conf spark.sql.files.maxPartitionBytes=512m \
       --conf spark.sql.shuffle.partitions=10 \
       --conf spark.plugins=com.nvidia.spark.SQLPlugin \
       --files $SPARK_RAPIDS_DIR/getGpusResources.sh \
       --jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}


Test job is from the example join-operation

val df = sc.makeRDD(1 to 10000000, 6).toDF
val df2 = sc.makeRDD(1 to 10000000, 6).toDF
df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" ===
$"b").count


I just noticed that the scala versions are out of sync - that shouldnt
affect it?


is there anything else I can try in the --conf or is there any logs to see
what might be failing behind the scenes, any suggestions?


Thanks
Martin


-- 
M

GPU job in Spark 3

Reply via email to