Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-211552091
Ok - I also think the basic idea is very useful and would like to see it
applicable to recommendation dataset splitting too. Hopefully we can come
up with a
Github user sethah closed the pull request at:
https://github.com/apache/spark/pull/8112
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-211550151
Looking over this PR, it needs some work and probably needs to be
refactored for efficiency. I will close it for now.
---
If your project is set up for it, you can
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-210234396
@MLnick I left some comments on the JIRA for
[SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489).
---
If your project is set up for it, you can reply to
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-209806115
@sethah could you take a look at the discussion in
[SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489),
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-207045826
This PR still seems useful to me. I too do not see an easy way to
implement it using DataFrames, though I'd like to do that eventually.
---
If your project is set
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-198545696
**[Test build #53568 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53568/consoleFull)**
for PR 8112 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-197808784
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-198517789
@MLnick I agree that these PRs are similar, but I'm not sure there is
overlap. In the new pipeline component proposed in
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-197850219
@sethah Could you also take a look at #11102 - it is somewhat related. We
should just ensure to sync up between the two ideas.
---
If your project is set up for it, you
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-197755583
**[Test build #53406 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53406/consoleFull)**
for PR 8112 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-197808444
**[Test build #53406 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53406/consoleFull)**
for PR 8112 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-197808782
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-197754076
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-198580182
**[Test build #53568 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53568/consoleFull)**
for PR 8112 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-198580329
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-198580327
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-197011427
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-196695062
@sethah I will take a look. I think this is an important feature to have.
Could you update to sort out merge conflicts?
---
If your project is set up for it, you can
Github user mperice commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-173363996
Awesome job! This is a much anticipated feature.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-173347402
pinging @mengxr @jkbradley
Nothing has happened with this PR in several months. Should I close it?
---
If your project is set up for it, you can reply to this
Github user CharlesSitbon commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-158743624
@sethah hey, what is the status of that feature ? I expected it to be on
1.6, looking forward to use it.
---
If your project is set up for it, you can reply to
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-158795213
@CharlesSitbon This PR is still awaiting review from @mengxr. I think it's
too late for this to be in 1.6
---
If your project is set up for it, you can reply to this
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339870
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -267,6 +268,26 @@ object MLUtils {
}
/**
+ * ::
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339841
--- Diff:
core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala
---
@@ -216,6 +216,35 @@ private[spark] object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339926
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
}
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339580
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
}
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339724
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -80,7 +96,18 @@ class CrossValidator(override val uid: String) extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339822
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
}
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40339748
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
}
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-142979645
@dusenberrymw Thanks for the feedback. I have addressed each of your
comments. Let me know if you see anything else.
---
If your project is set up for it, you can reply
Github user dusenberrymw commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40272553
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
Github user dusenberrymw commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40274245
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
Github user dusenberrymw commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40274887
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -79,8 +95,22 @@ class TrainValidationSplit(override val uid:
Github user dusenberrymw commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40270977
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -267,6 +268,26 @@ object MLUtils {
}
/**
+ * ::
Github user dusenberrymw commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40273307
--- Diff:
core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala
---
@@ -216,6 +216,35 @@ private[spark] object
Github user dusenberrymw commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40273855
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
Github user dusenberrymw commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r4027
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -80,7 +96,18 @@ class CrossValidator(override val uid: String)
Github user dusenberrymw commented on a diff in the pull request:
https://github.com/apache/spark/pull/8112#discussion_r40275181
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -263,6 +263,80 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
Github user dusenberrymw commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-142777920
Great job, @sethah! I've left some comments on areas for focus.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-139362995
@mengxr this has been idle for a while. Will you have a chance to review it?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-134440591
@mengxr is probably the best person for reviewing the use of ScaSRS ;)
---
If your project is set up for it, you can reply to this email and have your
reply appear
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/8112
[SPARK-8971][MLLIB][ML] Support balanced class labels when splitting
train/cross validation sets
I'm leaving a few comments about some of the design choices made in this PR.
- both
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8112#issuecomment-130088223
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
44 matches
Mail list logo