[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68470611 Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-31 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3702 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-29 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68286086 @srowen Sorry for the delay! I'm really starting to wonder about this JIRA, though. The collect() should return one BinaryLabelCounter per partition. I'd assume

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-29 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68295668 @jkbradley You'll end up with one `BinaryLabelCounter` per partition _per distinct key_ though. That's where the problem may occur. I think this would definitely

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-29 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68299469 @srowen Right; I was indeed just confused. I think it's fine. LGTM CC: @mengxr --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22232433 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,28 @@ import

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67993599 @srowen I added one last comment; sorry, I should have thought of it earlier. This took longer to do that I expected, but I think it's a good solution. Thanks for

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68004357 @jkbradley No problem, take a look at what I did to the scaladoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68004777 [Test build #24748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24748/consoleFull) for PR 3702 at commit

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68010995 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68010993 [Test build #24748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24748/consoleFull) for PR 3702 at commit

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183573 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,23 @@ import

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183579 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -124,4 +124,36 @@ class

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183575 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -103,7 +117,37 @@ class

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183580 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -124,4 +124,36 @@ class

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67876179 @srowen The logic test look fine; I just added a couple of comments. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22193166 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,23 @@ import org.apache.spark.rdd.{RDD,

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22194994 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,23 @@ import

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67908060 @jkbradley Changed my mind and went with a 2nd constructor. The reason is that it wouldn't actually make any sense to change the num bins after the curve is computed; it

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67908132 [Test build #24717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24717/consoleFull) for PR 3702 at commit

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67913401 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67913395 [Test build #24717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24717/consoleFull) for PR 3702 at commit