Dmitiry created SPARK-23460: ------------------------------- Summary: PySpark concurrency python egg cache directory Key: SPARK-23460 URL: https://issues.apache.org/jira/browse/SPARK-23460 Project: Spark Issue Type: Question Components: PySpark Affects Versions: 2.1.2 Environment: YARN last Reporter: Dmitiry
We are experiencing intermittent failures when running task on pyspark while installing dependencies through --py-files with python egg. We set (else permission denied on egg cache): {noformat} --conf "spark.executorEnv.PYTHON_EGG_CACHE=./.python-eggs"{noformat} Error: {noformat} INFO - File "build/bdist.linux-x86_64/egg/ua_parser/user_agent_parser.py", line 409, in <module> INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 904, in resource_filename INFO - self, resource_name INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1380, in get_resource_filename INFO - return self._extract_resource(manager, zip_path) INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1405, in _extract_resource INFO - self.egg_name, self._parts(zip_path) INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 984, in get_cache_path INFO - self.extraction_error() INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 950, in extraction_error INFO - raise err INFO - ExtractionError: Can't extract file(s) to egg cache INFO - INFO - The following error occurred while trying to extract file(s) to the Python egg INFO - cache: INFO - INFO - [Errno 17] File exists: './.python-eggs' INFO - INFO - The Python egg cache directory is currently set to: INFO - INFO - ./.python-eggs/ INFO - INFO - Perhaps your account does not have write access to this directory? You can INFO - change the cache directory by setting the PYTHON_EGG_CACHE environment INFO - variable to point to an accessible directory.{noformat} We create a package with an option `safe_zip=False`. But pyspark whatever use egg cache directory. Is there any way around this? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org