[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 OK well if you see evidence later that the disk spilled bytes are unreasonably high, it's worth reinvestigating to see if there's a problem like this. If you aren't seeing bad metrics though, then ma

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread djvulee
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen Yes, the file seems always empty before write, so the origin way is OK. Sorry for this PR is not thoughtful enough, I just get a mislead by the other method in the shuffle.py, which used th

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 I get that, but if it's always true, then there was no problem to begin with. That's what the code seems to think right now. I haven't looked at the code much but that's the question -- are you sure

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread djvulee
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen No. It does not matter whether the file is empty or not, if the file is empty, the `getsize()` just return 0, and this should be OK. --- If your project is set up for it, you can reply to t

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 Is the idea that the file may be non empty when written ? There is at least one more instance of this call but maybe the file is known to be empty before. --- If your project is set up for it,

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread djvulee
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen I update PR using an increment way to update the DiskBytesSpilled metrics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. I

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread djvulee
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen you are right, I will correct it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fea

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 Given how DiskBytesSpilled is used, and still used in other parts of the code, this doesn't look correct. It seems to be a global that is always incremented. Here you reset the value in certain cases

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-11 Thread djvulee
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen @davies mind taking a look? This PR is very simple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15052 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat