[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-03-10 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16739 Agree with @jkbradley on this one. We should avoid adding functions that are completely new in a patch release given that the timing between minor versions and patch releases aren't that high. As w

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16739 I've commented elsewhere, but wanted to here just to make more people aware: Let's refrain from backporting new APIs into patch versions unless they are really critical. We do not do this elsewhe

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16739 Thank YOU, always! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wis

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 @dongjoon-hyun my apologies, thanks for bringing this to my attention. I had to hang merge and didn't realize the mismatch. Opened a new PR to fix that. --- If your project is set up for it, yo

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16739 Hi, @felixcheung . While backporting, https://github.com/apache/spark/commit/6c35399068f1035fec6d5f909a83a5b1683702e0#diff-3d2a6b9d2b7d84ae179d7ea0f9eca696R1232 seems to break the build of

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 merged to master and branch-2.1 @gatorsmile thanks - please feel free to update or remove unneeded test cases. --- If your project is set up for it, you can reply to this email and have you

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72929/ Test PASSed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72929/testReport)** for PR 16739 at commit [`bf2373f`](https://github.com/apache/spark/commit/b

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72929/testReport)** for PR 16739 at commit [`bf2373f`](https://github.com/apache/spark/commit/bf

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72925/ Test FAILed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72925/testReport)** for PR 16739 at commit [`bf2373f`](https://github.com/apache/spark/commit/bf

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16739 The issue is fixed in https://github.com/apache/spark/pull/16933. If this is merged at first, I will fix the test case in this PR Thanks! : ) --- If your project is set up for it, you can reply

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-14 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 great, looking forward to that. I'm going to merge this unless anyone has a concern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16739 Let me rewrite the test cases in Scala. ```Scala val df = spark.range(0, 1, 1, 5) assert(df.rdd.getNumPartitions == 5) assert(df.coalesce(3).rdd.getNumPartition

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72790/ Test PASSed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72790/testReport)** for PR 16739 at commit [`55b99df`](https://github.com/apache/spark/commit/5

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72791/ Test PASSed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72791/testReport)** for PR 16739 at commit [`a0fe134`](https://github.com/apache/spark/commit/a

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72791/testReport)** for PR 16739 at commit [`a0fe134`](https://github.com/apache/spark/commit/a0

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72790/testReport)** for PR 16739 at commit [`55b99df`](https://github.com/apache/spark/commit/55

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-04 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 hmm, not as far as I can see: ``` > df2 <- repartition(df1, 10) > getNumPartitions(df2) # right after repartition the number of partition is greater than the original numSlices

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-02 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16739 : ) This might be caused by the optimizer rule `CollapseRepartition`. Can you output the plan by `explain(true)`? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-02 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 @gatorsmile thanks for commenting. `coalesce` currently accept a number even if it is larger than the current number of partitions - I guess we didn't want to throw exeception in that case?

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-02 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16739 `coalesce` is used to decrease the number of partitions in the RDD, but when you are setting it to a number that is larger than the number of the current RDD partitions, the result is not predica

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 yap, https://github.com/apache/spark/pull/16739#issuecomment-276739220 - only RDD has `coalesce(.. shuffle)`, in Dataset, it's `coalesce` and `repartition` --- If your project is set up for it

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16739 @felixcheung I was refering to the ` * However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, * this may result in your computation taking place on fewer nodes than *

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 and actually I find the current behavior a bit hard to explain, could someone perhaps enlighten me if this is intentional and how best, if we are to, document this behavior? ``` df <- a

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 surely, i think you mean https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L428 we will need to update this to say `use repartition() if you want

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16739 Thanks @felixcheung - I think these changes look good. cc @gatorsmile / @holdenk for doc changes in SQL, Python --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72240/ Test PASSed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72240/testReport)** for PR 16739 at commit [`3ed835a`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72240/testReport)** for PR 16739 at commit [`3ed835a`](https://github.com/apache/spark/commit/3e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72232/ Test FAILed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72232/testReport)** for PR 16739 at commit [`1bd7163`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72232/testReport)** for PR 16739 at commit [`1bd7163`](https://github.com/apache/spark/commit/1b

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72166/ Test PASSed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72166/testReport)** for PR 16739 at commit [`938c2ce`](https://github.com/apache/spark/commit/9

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72166 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72166/testReport)** for PR 16739 at commit [`938c2ce`](https://github.com/apache/spark/commit/93

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72149/testReport)** for PR 16739 at commit [`50ab563`](https://github.com/apache/spark/commit/5

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72149/ Test PASSed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72149/testReport)** for PR 16739 at commit [`50ab563`](https://github.com/apache/spark/commit/50

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16739 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16739 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72147/ Test FAILed. ---

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-01-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72147/testReport)** for PR 16739 at commit [`50ab563`](https://github.com/apache/spark/commit/50