[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 thanks for the review everyone! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82745/ Test PASSed. ---

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82745/testReport)** for PR 18979 at commit [`c0e81a1`](https://github.com/apache/spark/commit/c

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82745/testReport)** for PR 18979 at commit [`c0e81a1`](https://github.com/apache/spark/commit/c0

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 done. Not writing 0-byte files will offer significant speedup against object stores, where the cost of a call to getFileStatus() can take hundreds of millis. I look forward to it --- --

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18979 Could you resolve the conflicts again? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional co

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82731/testReport)** for PR 18979 at commit [`649f8da`](https://github.com/apache/spark/commit/6

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82731/ Test PASSed. ---

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82732/ Test PASSed. ---

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82732/testReport)** for PR 18979 at commit [`d3f96f6`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18979 Hi, @steveloughran . > is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & things downstrea

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82730/ Test PASSed. ---

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)** for PR 18979 at commit [`adab985`](https://github.com/apache/spark/commit/a

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 The latest PR update pulls in @dongjoon-hyun's new test; to avoid merge conflict in the Insert suite I've rebased against master. 1. Everything handles missing files on output 2.

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82732/testReport)** for PR 18979 at commit [`d3f96f6`](https://github.com/apache/spark/commit/d3

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82731/testReport)** for PR 18979 at commit [`649f8da`](https://github.com/apache/spark/commit/64

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)** for PR 18979 at commit [`adab985`](https://github.com/apache/spark/commit/ad

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Noted :) @dongjoon-hyun : is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & things downst

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18979 Gentle ping, @steveloughran ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 Could you also include the [test cases](https://github.com/dongjoon-hyun/spark/blob/b545f281b19120cc2c9e4197cae4b1315969247d/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySui

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18979 +1. This solves the regression on writing emtpy dataset with ORC format, too! --- - To unsubscribe, e-mail: reviews-unsubs

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 LGTM except a minor comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-ma

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-11 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 @viirya : the new data writer API will allow for a broader set of stats to be propagated back from workers. When you are working with the object stores, an useful stat to get back is throttle

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18979 I don't have strong opinion against this. Incorrect size is an issue but I can't think a better solution for now... --- - To unsu

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 Will review it tomorrow --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: re

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-10 Thread adrian-ionescu
Github user adrian-ionescu commented on the issue: https://github.com/apache/spark/pull/18979 To me, this looks good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Has anyone had a look at this recently? The problem still exists, and while downstream filesystems can address if they recognise the use case & lie about values, they will be returnin

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-22 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Related to this, updated spec on [Hadoop output stream, Syncable and StreamCapabilities](https://github.com/steveloughran/hadoop/blob/s3/HADOOP-13327-outputstream-trunk/hadoop-common-project/ha

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80841/ Test PASSed. ---

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #80841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80841/testReport)** for PR 18979 at commit [`f778213`](https://github.com/apache/spark/commit/f

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #80841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80841/testReport)** for PR 18979 at commit [`f778213`](https://github.com/apache/spark/commit/f7

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 @adrian-ionescu wrote > is there a need for calling getFinalStats() more than once? No. As long as everyone is aware of it, it won't be an issue. --- If your project is set up for

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 > To mimic S3-like behavior, you can overwrite the file system spark.hadoop.fs.$scheme.impl" @gatorsmile: you will be able to do something better soon, as S3A is adding an inconsisten

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Currently *nobody should be using s3a:// at the the temp file destination*, which is the same as saying "nobody should be using s3a:// as the direct destination of work", not without a special

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18979 Btw, as the file path passed to state tracker should be task temp file, is it common to directly use S3 as temp file output destination? --- If your project is set up for it, you can reply to this e

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-17 Thread adrian-ionescu
Github user adrian-ionescu commented on the issue: https://github.com/apache/spark/pull/18979 Thanks for the fix and tests, @steveloughran! Re 1. -- is there a need for calling `getFinalStats()` more than once? The function doc clearly states that it's not supported and may lead to

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80803/ Test PASSed. ---

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #80803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80803/testReport)** for PR 18979 at commit [`2a113fd`](https://github.com/apache/spark/commit/2

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-17 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 To mimic S3-like behavior, you can overwrite the file system `spark.hadoop.fs.$scheme.impl` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-17 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18979 cc @adrian-ionescu --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes s

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #80803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80803/testReport)** for PR 18979 at commit [`2a113fd`](https://github.com/apache/spark/commit/2a