[ 
https://issues.apache.org/jira/browse/LIVY-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated LIVY-750:
-----------------------------
    Description: 
On Livy Server, even if we set  pyspark archives to use local files:
{code:bash}
export 
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}

Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip 
-> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip

Note that this is after we fixed Spark code in SPARK-30845 to not always upload 
local archives.

The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
which will be added to Yarn distributed cache by Spark. Since spark-submit 
already takes care of finding and uploading pyspark archives if it is not 
local, there is no need for Livy to redundantly do so.

  was:
On Livy Server, even if we set  pyspark archives to use local files:
{code:bash}
export 
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}

Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip 
-> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip

Note that this is after we fixed Spark code in SPARK-30845 to not always upload 
local archives.

The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
which will be added to Yarn distributed cache by Spark. Since spark-submit 
already takes care of uploading pyspark archives, there is no need for Livy to 
redundantly do so.


> Livy uploads local pyspark archives to Yarn distributed cache
> -------------------------------------------------------------
>
>                 Key: LIVY-750
>                 URL: https://issues.apache.org/jira/browse/LIVY-750
>             Project: Livy
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: shanyu zhao
>            Priority: Major
>         Attachments: image-2020-02-16-13-19-40-645.png, 
> image-2020-02-16-13-19-59-591.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> On Livy Server, even if we set  pyspark archives to use local files:
> {code:bash}
> export 
> PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
> {code}
> Livy still upload these local pyspark archives to Yarn distributed cache:
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
> yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
> yarn.Client: Uploading resource 
> file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> 
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip
> Note that this is after we fixed Spark code in SPARK-30845 to not always 
> upload local archives.
> The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
> which will be added to Yarn distributed cache by Spark. Since spark-submit 
> already takes care of finding and uploading pyspark archives if it is not 
> local, there is no need for Livy to redundantly do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to