Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
thanks for the review everyone!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18979
Thanks! Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82745/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #82745 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82745/testReport)**
for PR 18979 at commit
[`c0e81a1`](https://github.com/apache/spark/commit/c
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #82745 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82745/testReport)**
for PR 18979 at commit
[`c0e81a1`](https://github.com/apache/spark/commit/c0
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
done. Not writing 0-byte files will offer significant speedup against
object stores, where the cost of a call to getFileStatus() can take hundreds of
millis. I look forward to it
---
--
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18979
Could you resolve the conflicts again?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional co
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #82731 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82731/testReport)**
for PR 18979 at commit
[`649f8da`](https://github.com/apache/spark/commit/6
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82731/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82732/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #82732 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82732/testReport)**
for PR 18979 at commit
[`d3f96f6`](https://github.com/apache/spark/commit/d
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18979
Hi, @steveloughran .
> is the issue with ORC that if there's nothing to write, it doesn't
generate a file (so avoiding that issue with sometimes you get 0-byte ORC files
& things downstrea
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82730/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #82730 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)**
for PR 18979 at commit
[`adab985`](https://github.com/apache/spark/commit/a
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
The latest PR update pulls in @dongjoon-hyun's new test; to avoid merge
conflict in the Insert suite I've rebased against master.
1. Everything handles missing files on output
2.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #82732 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82732/testReport)**
for PR 18979 at commit
[`d3f96f6`](https://github.com/apache/spark/commit/d3
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #82731 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82731/testReport)**
for PR 18979 at commit
[`649f8da`](https://github.com/apache/spark/commit/64
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #82730 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)**
for PR 18979 at commit
[`adab985`](https://github.com/apache/spark/commit/ad
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
Noted :)
@dongjoon-hyun : is the issue with ORC that if there's nothing to write, it
doesn't generate a file (so avoiding that issue with sometimes you get 0-byte
ORC files & things downst
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18979
Gentle ping, @steveloughran !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18979
Could you also include the [test
cases](https://github.com/dongjoon-hyun/spark/blob/b545f281b19120cc2c9e4197cae4b1315969247d/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySui
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18979
+1. This solves the regression on writing emtpy dataset with ORC format,
too!
---
-
To unsubscribe, e-mail: reviews-unsubs
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18979
LGTM except a minor comment.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-ma
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
@viirya : the new data writer API will allow for a broader set of stats to
be propagated back from workers. When you are working with the object stores,
an useful stat to get back is throttle
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/18979
I don't have strong opinion against this. Incorrect size is an issue but I
can't think a better solution for now...
---
-
To unsu
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18979
Will review it tomorrow
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: re
Github user adrian-ionescu commented on the issue:
https://github.com/apache/spark/pull/18979
To me, this looks good.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
Has anyone had a look at this recently?
The problem still exists, and while downstream filesystems can address if
they recognise the use case & lie about values, they will be returnin
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
Related to this, updated spec on [Hadoop output stream, Syncable and
StreamCapabilities](https://github.com/steveloughran/hadoop/blob/s3/HADOOP-13327-outputstream-trunk/hadoop-common-project/ha
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80841/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #80841 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80841/testReport)**
for PR 18979 at commit
[`f778213`](https://github.com/apache/spark/commit/f
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #80841 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80841/testReport)**
for PR 18979 at commit
[`f778213`](https://github.com/apache/spark/commit/f7
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
@adrian-ionescu wrote
> is there a need for calling getFinalStats() more than once?
No. As long as everyone is aware of it, it won't be an issue.
---
If your project is set up for
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
> To mimic S3-like behavior, you can overwrite the file system
spark.hadoop.fs.$scheme.impl"
@gatorsmile: you will be able to do something better soon, as S3A is adding
an inconsisten
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/18979
Currently *nobody should be using s3a:// at the the temp file destination*,
which is the same as saying "nobody should be using s3a:// as the direct
destination of work", not without a special
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/18979
Btw, as the file path passed to state tracker should be task temp file, is
it common to directly use S3 as temp file output destination?
---
If your project is set up for it, you can reply to this e
Github user adrian-ionescu commented on the issue:
https://github.com/apache/spark/pull/18979
Thanks for the fix and tests, @steveloughran!
Re 1. -- is there a need for calling `getFinalStats()` more than once? The
function doc clearly states that it's not supported and may lead to
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18979
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80803/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #80803 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80803/testReport)**
for PR 18979 at commit
[`2a113fd`](https://github.com/apache/spark/commit/2
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18979
To mimic S3-like behavior, you can overwrite the file system
`spark.hadoop.fs.$scheme.impl`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/18979
cc @adrian-ionescu
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes s
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18979
**[Test build #80803 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80803/testReport)**
for PR 18979 at commit
[`2a113fd`](https://github.com/apache/spark/commit/2a
48 matches
Mail list logo