Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19184
**[Test build #81614 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81614/testReport)**
for PR 19184 at commit
[`dcc2960`](https://github.com/apache/spark/commit/dc
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19184
hmm, shouldn't we just change system config to increase the limit of open
file?
---
-
To unsubscribe, e-mail: reviews-unsubscr...
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/19184
I got into this with the limit of 32K. "unlimited" is another option which
can be a workaround for this. But that may not be a preferable option in
production systems. For e.g, with Q67 I o
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19184
**[Test build #81614 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81614/testReport)**
for PR 19184 at commit
[`dcc2960`](https://github.com/apache/spark/commit/d
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19184
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19184
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81614/
Test PASSed.
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19184
cc @cloud-fan @jiangxb1987
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mai
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19184
**[Test build #81628 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81628/testReport)**
for PR 19184 at commit
[`ea5f9d9`](https://github.com/apache/spark/commit/ea
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/19184
Thanks @viirya . I have updated the patch to address your comments.
This fixes the "too many files open" issue for (e.g Q67, Q72, Q14 etc)
which involves window functions; but for th
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19184
@rajeshbalamohan Thanks for updating. I think we need a complete fix as
previous comments from the reviewers @jerryshao @kiszk @jiangxb1987 suggested.
Can you try to fix this according to the comment
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19184
**[Test build #81628 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81628/testReport)**
for PR 19184 at commit
[`ea5f9d9`](https://github.com/apache/spark/commit/e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19184
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81628/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19184
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19184
@viirya @jerryshao To take a step back here.
This specific issue is applicable to window operations and not to shuffle.
In shuffle, you a much larger volume of data written per file
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19184
Hi @mridulm , sorry for late response. I agree with you that the scenario
is different between here and shuffle, but the underlying structure and
solutions to spill data is the same, so the proble
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19184
@jerryshao Actually the second half of your comment is not valid in this
case.
The PR is not targeting the merge sort in this case, but relevant when
iterating over all tuples.
`UnsafeE
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19184
After discussed with @mridulm offline. Though the patch here cannot address
the issue of `getSortedIterator` - which uses a PriorityQueue, somehow it
solves the problem of `getIterator(...)` which
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19184
Thanks to @jerryshao for pointing me to SPARK-21595.
The tests which @rajeshbalamohan did were with a version which did not
include the changes in SPARK-21595; and unfortunately my local repo was
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19184
Thanks @jerryshao and @mridulm for investigating this further. It is very
reasonable. I think we don't need this fix as the spill won't be too frequent
in window operations now.
---
--
Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/19184
Thanks @mridulm , @jerryshao , @viirya . closing this PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.
20 matches
Mail list logo