I don't see anything in this job that would use a GPU? On Fri, Apr 9, 2021 at 11:19 AM Martin Somers <sono...@gmail.com> wrote:
> > Hi Everyone !! > > Im trying to get on premise GPU instance of Spark 3 running on my ubuntu > box, and I am following: > > https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation > > Anyone with any insight into why a spark job isnt being ran on the GPU - > appears to be all on the CPU, hadoop binary installed and appears to be > functioning fine > > export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath) > > here is my setup on ubuntu20.10 > > > ▶ nvidia-smi > > > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 > | > > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. > ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute > M. | > | | | MIG > M. | > > |===============================+======================+======================| > | 0 GeForce RTX 3090 Off | 00000000:21:00.0 On | > N/A | > | 0% 38C P8 19W / 370W | 478MiB / 24265MiB | 0% > Default | > | | | > N/A | > > +-------------------------------+----------------------+----------------------+ > > /opt/sparkRapidsPlugin > > > ▶ ls > cudf-0.18.1-cuda11.jar getGpusResources.sh rapids-4-spark_2.12-0.4.1.jar > > ▶ scalac --version > Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and > Lightbend, Inc. > > > ▶ spark-shell --version > 2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to > a loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface > wlp71s0) > 2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to > bind to another address > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform > (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor > java.nio.DirectByteBuffer(long,int) > WARNING: Please consider reporting this to the maintainers of > org.apache.spark.unsafe.Platform > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 > /_/ > > Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10 > Branch HEAD > Compiled by user ubuntu on 2021-02-22T01:04:02Z > Revision 1d550c4e90275ab418b9161925049239227f3dc9 > Url https://github.com/apache/spark > Type --help for more information. > > > here is how I calling spark prior to adding the test job > > $SPARK_HOME/bin/spark-shell \ > --master local \ > --num-executors 1 \ > --conf spark.executor.cores=16 \ > --conf spark.rapids.sql.concurrentGpuTasks=1 \ > --driver-memory 10g \ > --conf > spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} > > --conf spark.rapids.memory.pinnedPool.size=16G \ > --conf spark.locality.wait=0s \ > --conf spark.sql.files.maxPartitionBytes=512m \ > --conf spark.sql.shuffle.partitions=10 \ > --conf spark.plugins=com.nvidia.spark.SQLPlugin \ > --files $SPARK_RAPIDS_DIR/getGpusResources.sh \ > --jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR} > > > Test job is from the example join-operation > > val df = sc.makeRDD(1 to 10000000, 6).toDF > val df2 = sc.makeRDD(1 to 10000000, 6).toDF > df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" === > $"b").count > > > I just noticed that the scala versions are out of sync - that shouldnt > affect it? > > > is there anything else I can try in the --conf or is there any logs to see > what might be failing behind the scenes, any suggestions? > > > Thanks > Martin > > > -- > M >