[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77157/testReport)** for PR 18052 at commit [`72bd097`](https://github.com/apache/spark/commit/72

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77157/testReport)** for PR 18052 at commit [`72bd097`](https://github.com/apache/spark/commit/7

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77157/ Test FAILed. ---

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77161/testReport)** for PR 18052 at commit [`06116d1`](https://github.com/apache/spark/commit/06

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77161/testReport)** for PR 18052 at commit [`06116d1`](https://github.com/apache/spark/commit/0

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77161/ Test FAILed. ---

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77180/testReport)** for PR 18052 at commit [`ff55e2d`](https://github.com/apache/spark/commit/ff

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77180/testReport)** for PR 18052 at commit [`ff55e2d`](https://github.com/apache/spark/commit/f

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77180/ Test PASSed. ---

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-03 Thread zero323
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/18052 __Note__: [Waiting for some feedback](https://twitter.com/holdenkarau/status/866672579318337537). --- If your project is set up for it, you can reply to this email and have your reply appear on Git

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18052 Lets see what @davies has to say. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled a

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/18052 It seems that it's also easy to implement these outside of PySpark by user themselves or third-party libraries, right? If that's the case, I'd like not to add it into PySpark. --- If your project i

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread zero323
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/18052 @davies It is. Monkey patching context, `RDD` and some classes not covered by Scala `AsyncRDDFunctions`, [takes around 100 LOCs](https://github.com/zero323/pyspark-asyncactions) (excluding tests, c

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18052 I think the kind of users wanting to use async actions are also the same kind of users who would be writing multi-threaded Spark applications. That being said @davies is there a reason you d

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/18052 Personally, I think less is more, don't add everything into every software, otherwise every software can write email eventually. The RDD API is kind of frozen, we don't add more APIs into it

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18052 I subscribed this PR as actually I am the user who implemented the similar one in user side weirdly. But one thing I can tell is, it was buggy and hard to debugg / test in particular when the wo

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18052 Sure, I think that the Python specific implementation is probably something we don't want to pick up as a maintenance burden - but exposing the current Java API seems reasonable -- especially if we

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18052 Yea, I would then go +1 if this can simply done by exposing the existing APIs. For my production case, I need this at least. This would not be something to be suggested to be closed for no inter

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread zero323
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/18052 Personally I would prefer not including this at all, than using JVM implementation with callbacks: - Py4J gateway is already pretty slow, and can be unstable under high load. Putting higher

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18052 Hm.. for the former, isn't that the problem existing already for other APIs if Py4J itself is problematic? For the latter, if this is not that simple, I would rathet avoid adding this API for n

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread zero323
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/18052 IMHO it is, but this feature is hardly essential. Arguably we wouldn't need Scala API in the first place, if the built-in `Future` supported canceling. It is possible I am overthinking the l