shanyu zhao created LIVY-750: -------------------------------- Summary: Livy uploads local pyspark archives to Yarn distributed cache Key: LIVY-750 URL: https://issues.apache.org/jira/browse/LIVY-750 Project: Livy Issue Type: Bug Components: Server Affects Versions: 0.7.0, 0.6.0 Reporter: shanyu zhao Attachments: image-2020-02-16-13-19-40-645.png, image-2020-02-16-13-19-59-591.png
On Livy Server, even if we set pyspark archives to use local files: {code:bash} export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip {code} Livy still upload these local pyspark archives to Yarn distributed cache: 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip Note that this is after we fixed Spark code in SPARK-30845 to not always upload local archives. The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", which will be added to Yarn distributed cache by Spark. Since spark-submit already takes care of uploading pyspark archives, there is no need for Livy to redundantly do so. -- This message was sent by Atlassian Jira (v8.3.4#803005)