Vinh Tran created SPARK-32411:
---------------------------------

             Summary: GPU Cluster Fail
                 Key: SPARK-32411
                 URL: https://issues.apache.org/jira/browse/SPARK-32411
             Project: Spark
          Issue Type: Bug
          Components: PySpark, Web UI
    Affects Versions: 3.0.0
         Environment: Ihave a Apache Spark 3.0 cluster consisting of machines 
with multiple nvidia-gpus and I connect my jupyter notebook to the cluster 
using pyspark,
            Reporter: Vinh Tran


I'm having a difficult time getting a GPU cluster started on Apache Spark 3.0. 
It was hard to find documentation on this, but I stumbled on a NVIDIA github 
page for Rapids which suggested the following additional edits to the 
spark-defaults.conf:
{code:java}
spark.task.resource.gpu.amount 0.25
spark.executor.resource.gpu.discoveryScript 
./usr/local/spark/getGpusResources.sh{code}
I have a Apache Spark 3.0 cluster consisting of machines with multiple 
nvidia-gpus and I connect my jupyter notebook to the cluster using pyspark, 
however it results in the following error: 
{code:java}
Py4JJavaError: An error occurred while calling 
None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: You must specify an amount for gpu
        at 
org.apache.spark.resource.ResourceUtils$.$anonfun$parseResourceRequest$1(ResourceUtils.scala:142)
        at scala.collection.immutable.Map$Map1.getOrElse(Map.scala:119)
        at 
org.apache.spark.resource.ResourceUtils$.parseResourceRequest(ResourceUtils.scala:142)
        at 
org.apache.spark.resource.ResourceUtils$.$anonfun$parseAllResourceRequests$1(ResourceUtils.scala:159)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
        at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75)
        at scala.collection.TraversableLike.map(TraversableLike.scala:238)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.spark.resource.ResourceUtils$.parseAllResourceRequests(ResourceUtils.scala:159)
        at 
org.apache.spark.SparkContext$.checkResourcesPerTask$1(SparkContext.scala:2773)
        at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2884)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
        at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
{code}
After this, I then tried adding another line to the conf per the instructions 
which results in no errors, however when I log in to the Web UI at 
localhost:8080, under Running Applications, the state remains at waiting.
{code:java}
spark.task.resource.gpu.amount                  2
spark.executor.resource.gpu.discoveryScript    
./usr/local/spark/getGpusResources.sh
spark.executor.resource.gpu.amount              1
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to