Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15052
OK well if you see evidence later that the disk spilled bytes are
unreasonably high, it's worth reinvestigating to see if there's a problem like
this. If you aren't seeing bad metrics though, then ma
Github user djvulee commented on the issue:
https://github.com/apache/spark/pull/15052
@srowen Yes, the file seems always empty before write, so the origin way is
OK. Sorry for this PR is not thoughtful enough, I just get a mislead by the
other method in the shuffle.py, which used th
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15052
I get that, but if it's always true, then there was no problem to begin
with. That's what the code seems to think right now. I haven't looked at the
code much but that's the question -- are you sure
Github user djvulee commented on the issue:
https://github.com/apache/spark/pull/15052
@srowen No. It does not matter whether the file is empty or not, if the
file is empty, the `getsize()` just return 0, and this should be OK.
---
If your project is set up for it, you can reply to t
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15052
Is the idea that the file may be non empty when written ?
There is at least one more instance of this call but maybe the file is
known to be empty before.
---
If your project is set up for it,
Github user djvulee commented on the issue:
https://github.com/apache/spark/pull/15052
@srowen I update PR using an increment way to update the DiskBytesSpilled
metrics.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. I
Github user djvulee commented on the issue:
https://github.com/apache/spark/pull/15052
@srowen you are right, I will correct it soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fea
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15052
Given how DiskBytesSpilled is used, and still used in other parts of the
code, this doesn't look correct. It seems to be a global that is always
incremented. Here you reset the value in certain cases
Github user djvulee commented on the issue:
https://github.com/apache/spark/pull/15052
@srowen @davies mind taking a look? This PR is very simple.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15052
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feat
10 matches
Mail list logo