GitHub user holdenk opened a pull request: https://github.com/apache/spark/pull/18734
[WIP][SPARK-21070][PYSPARK] Attempt to update cloudpickle again ## What changes were proposed in this pull request? Based on https://github.com/apache/spark/pull/18282 by @rgbkrk this PR attempts to update to the current released cloudpickle and minimize the difference between Spark cloudpickle and "stock" cloud pickle with the goal of eventually using the stock cloud pickle. Some notable changes: * Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80) * Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90) * Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88) * Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85) * Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72) * Allow pickling of builtin methods (cloudpipe/cloudpickle#57) * Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52) * Support method descriptor (cloudpipe/cloudpickle#46) * No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32) * ** Remove non-standard __transient__check (cloudpipe/cloudpickle#110)** -- while we don't use this internally, and have no tests or documentation for its use, downstream code may use __transient__, although it has never been part of the API, if we merge this we should include a note about this in the release notes. * Support for pickling loggers (yay!) (cloudpipe/cloudpickle#96) * BUG: Fix crash when pickling dynamic class cycles. (cloudpipe/cloudpickle#102) ## How was this patch tested? Existing PySpark unit tests + the unit tests from the cloudpickle project on their own. You can merge this pull request into a Git repository by running: $ git pull https://github.com/holdenk/spark holden-rgbkrk-cloudpickle-upgrades Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18734.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18734 ---- commit b84222b6ea660ce5ec1aedfee50e297b436eb824 Author: Kyle Kelley <rgb...@gmail.com> Date: 2017-06-12T22:19:29Z [SPARK-21070][PYSPARK] Upgrade cloudpickle This brings in fixes and upgrades from the [cloudpickle](https://github.com/cloudpipe/cloudpickle) module, notably: * Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80) * Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90) * Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88) * Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85) * Make cloudpickle Python 3.6 compatible (https://github.com/cloudpipe/cloudpickle/pull/72) * Allow pickling of builtin methods (https://github.com/cloudpipe/cloudpickle/pull/57) * Add ability to pickle dynamically created modules (https://github.com/cloudpipe/cloudpickle/pull/52) * Support method descriptor (https://github.com/cloudpipe/cloudpickle/pull/46) * No more pickling of closed files, was broken on Python 3 (https://github.com/cloudpipe/cloudpickle/pull/32) commit 6d5e5cf412e11d4b8b30f7d46c15405edcb4cb05 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-24T19:58:32Z Merge branch 'master' into holden-rgbkrk-cloudpickle-upgrades commit 1cfd38f73da328bcf58ab32228ace4ff59bc26d2 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-24T20:01:46Z Copy over support work weakset, dynamic classess, and remove __transient__ support from PR#110 commit f8ff2da5c093bf20a65df1869ecd3def3fbac2c5 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-25T22:02:16Z Test fix commit cff6bfb83d04ee29d660a300b08f6a99dd636cf0 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-25T22:02:24Z Revert "Copy over support work weakset, dynamic classess, and remove __transient__ support from PR#110" This reverts commit 1cfd38f73da328bcf58ab32228ace4ff59bc26d2. commit 195cd21ece2036df88a95d2fdc7dfd29c5681efa Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-25T22:07:13Z Fixed named tuple issue commit 74880eabf218b4e9f29a25583442ee8d3b6bb0a6 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-25T22:09:30Z Try and move the fix for namedtuple and re-enable the rest of the useful cloudpickle fixes. Revert "Revert "Copy over support work weakset, dynamic classess, and remove __transient__ support from PR#110"" This reverts commit cff6bfb83d04ee29d660a300b08f6a99dd636cf0. commit 9a0f9b4b9958c5fed6d2c84d725cb03a7be7d41e Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-25T22:12:24Z Re-enable our custom exception message commit 09cf41eb3e75e92cc9914e675fb2cb2f99290d38 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-25T23:49:57Z Save and restore the module info functions ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org