[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-13 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-43039510 I wrote a simple benchmark to test performance, Iterator#size really sucks... Sorry for my mistake, I'll close this pull request :( --- If your project is set up for it

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-13 Thread cloud-fan
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/736 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is en

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42968273 This is not equivalent performance wise from casual look. Even assuming everything is same, it is still invoking function in loop versus direct addition. --- If your

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-13 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42955215 @rxin I'm sorry I didn't got a link for that, but I didn't find any discussion about performance issue of Iterator#size, either. I just checked the source code of Iterat

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42913289 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42913290 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14931/ --- If your project

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42911559 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42910326 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42910274 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42909874 Thanks for submitting this. Do you have a link to the Scala compiler/collection library ticket that impacted this? --- If your project is set up for it, you can reply to th

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42804113 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/736 use Iterator#size in RDD#count in RDD#count, we used while loop to get the size of Iterator because that Iterator#size used a for loop, which was slightly slower in that version of Scala. But for