[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-04-18 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-211552091 Ok - I also think the basic idea is very useful and would like to see it applicable to recommendation dataset splitting too. Hopefully we can come up with a

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-04-18 Thread sethah
Github user sethah closed the pull request at: https://github.com/apache/spark/pull/8112 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-04-18 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-211550151 Looking over this PR, it needs some work and probably needs to be refactored for efficiency. I will close it for now. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-04-14 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-210234396 @MLnick I left some comments on the JIRA for [SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489). --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-04-14 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-209806115 @sethah could you take a look at the discussion in [SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489),

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-04-07 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-207045826 This PR still seems useful to me. I too do not see an easy way to implement it using DataFrames, though I'd like to do that eventually. --- If your project is set

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-198545696 **[Test build #53568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53568/consoleFull)** for PR 8112 at commit

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-197808784 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-19 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-198517789 @MLnick I agree that these PRs are similar, but I'm not sure there is overlap. In the new pipeline component proposed in

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-19 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-197850219 @sethah Could you also take a look at #11102 - it is somewhat related. We should just ensure to sync up between the two ideas. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-197755583 **[Test build #53406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53406/consoleFull)** for PR 8112 at commit

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-197808444 **[Test build #53406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53406/consoleFull)** for PR 8112 at commit

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-197808782 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-18 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-197754076 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-198580182 **[Test build #53568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53568/consoleFull)** for PR 8112 at commit

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-198580329 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-198580327 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-15 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-197011427 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-03-15 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-196695062 @sethah I will take a look. I think this is an important feature to have. Could you update to sort out merge conflicts? --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-01-20 Thread mperice
Github user mperice commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-173363996 Awesome job! This is a much anticipated feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2016-01-20 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-173347402 pinging @mengxr @jkbradley Nothing has happened with this PR in several months. Should I close it? --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-11-22 Thread CharlesSitbon
Github user CharlesSitbon commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-158743624 @sethah hey, what is the status of that feature ? I expected it to be on 1.6, looking forward to use it. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-11-22 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-158795213 @CharlesSitbon This PR is still awaiting review from @mengxr. I think it's too late for this to be in 1.6 --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339870 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -267,6 +268,26 @@ object MLUtils { } /** + * ::

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339841 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala --- @@ -216,6 +216,35 @@ private[spark] object

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339926 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339580 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339724 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -80,7 +96,18 @@ class CrossValidator(override val uid: String) extends

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339822 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40339748 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-24 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-142979645 @dusenberrymw Thanks for the feedback. I have addressed each of your comments. Let me know if you see anything else. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40272553 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40274245 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40274887 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala --- @@ -79,8 +95,22 @@ class TrainValidationSplit(override val uid:

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40270977 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -267,6 +268,26 @@ object MLUtils { } /** + * ::

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40273307 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala --- @@ -216,6 +216,35 @@ private[spark] object

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40273855 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r4027 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -80,7 +96,18 @@ class CrossValidator(override val uid: String)

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on a diff in the pull request: https://github.com/apache/spark/pull/8112#discussion_r40275181 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-23 Thread dusenberrymw
Github user dusenberrymw commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-142777920 Great job, @sethah! I've left some comments on areas for focus. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-09-10 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-139362995 @mengxr this has been idle for a while. Will you have a chance to review it? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-08-24 Thread feynmanliang
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-134440591 @mengxr is probably the best person for reviewing the use of ScaSRS ;) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-08-11 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/8112 [SPARK-8971][MLLIB][ML] Support balanced class labels when splitting train/cross validation sets I'm leaving a few comments about some of the design choices made in this PR. - both

[GitHub] spark pull request: [SPARK-8971][MLLIB][ML] Support balanced class...

2015-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8112#issuecomment-130088223 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your