[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-09-02 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14573 Agree it would be subsumed and it looks pretty cool. I didn't know you can make it asynchronous also you want to avoid spinning too many tasks since these consume resources and block other jobs.

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-09-02 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/14573 Nice change, but I think that at least some of the benefit of this will be subsumed by #14854, my patch which allows `take()` to cancel the running job as soon as enough output is produced. ---

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-09-02 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14573 LGTM - merging to master. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-25 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14573 @hvanhovell made all the suggested changes. I initially misunderstood what getByteArrayRdd does. Shuold be good now --- If your project is set up for it, you can reply to this email and have you

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14573 **[Test build #3233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3233/consoleFull)** for PR 14573 at commit [`e25eb6e`](https://github.com/apache/spark/commit

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14573 **[Test build #3233 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3233/consoleFull)** for PR 14573 at commit [`e25eb6e`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14573 cc @hvanhovell for another look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wi

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-23 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14573 Ping, anything else? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14573 Actually while you are at this, can you make the ramp-up configurable? Add an entry to SQLConf; something like spark.sql.limit.scaleUpFactor --- If your project is set up for it, you can reply

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14573 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat