Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71585503
Merged into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/4073
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71584396
@mengxr This can also be viewd as a bugfix which prevents overwriting of
the param `subSamplingRate`, which was hardcoded to 1.0
---
If your project is set up for it,
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71510669
@MechCoder Thanks! LGTM
CC: @mengxr Note this is sort of an API change: RandomForest can now be
run with subsampled rows. (But this seems fine to me since us
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71352330
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71352326
[Test build #26061 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26061/consoleFull)
for PR 4073 at commit
[`8012fb2`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71349963
[Test build #26061 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26061/consoleFull)
for PR 4073 at commit
[`8012fb2`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71349920
@jkbradley Fixed. I can haz merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71338606
@MechCoder This is an addition instead of a correction, but I just
realized that Strategy.assertValid() does not check subsamplingRate. Would you
mind adding that ch
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-71335512
ping @jkbradley Could you please have a final look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If y
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70998086
[Test build #25961 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25961/consoleFull)
for PR 4073 at commit
[`e0e0d9c`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70998096
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70990119
[Test build #25961 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25961/consoleFull)
for PR 4073 at commit
[`e0e0d9c`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70989958
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70989944
[Test build #25955 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25955/consoleFull)
for PR 4073 at commit
[`d5d68e7`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70983710
[Test build #25955 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25955/consoleFull)
for PR 4073 at commit
[`d5d68e7`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70983304
Repushed after fixing the style checks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70982966
[Test build #25953 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25953/consoleFull)
for PR 4073 at commit
[`8a0acb5`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70982967
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70982888
[Test build #25953 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25953/consoleFull)
for PR 4073 at commit
[`8a0acb5`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70982719
@jkbradley Thanks for the tip. Fixed. Anything more?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/4073#discussion_r23330643
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala ---
@@ -196,6 +196,24 @@ class RandomForestSuite extends FunSuite with
M
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/4073#discussion_r23329625
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -132,6 +132,7 @@ private class RandomForest (
timer.start("ini
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/4073#discussion_r23327520
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala ---
@@ -196,6 +196,24 @@ class RandomForestSuite extends FunSuite with
M
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4073#discussion_r23250726
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala ---
@@ -196,6 +196,24 @@ class RandomForestSuite extends FunSuite with
M
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70538422
@mengxr @jkbradley Any more comments? Sorry for spamming, but I would like
to work on other issues related to GBRT and RandomForests as well.
---
If your project is se
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70378420
[Test build #25705 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25705/consoleFull)
for PR 4073 at commit
[`d1df1b2`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70378423
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70375392
[Test build #25705 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25705/consoleFull)
for PR 4073 at commit
[`d1df1b2`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70375368
@jkbradley I've added a test according to the other tests in the
`RandomForestSuite` . Let me know if there is anything left.
---
If your project is set up for it, you
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70372791
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70372786
[Test build #25704 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25704/consoleFull)
for PR 4073 at commit
[`a7bfc70`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70370361
[Test build #25704 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25704/consoleFull)
for PR 4073 at commit
[`a7bfc70`](https://githu
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70369703
Could you please tell me what is the preferred way to generate random data
in spark?
---
If your project is set up for it, you can reply to this email and have your
re
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70307249
I'd vote for not adding it to train since that part of the API is so
unwieldy.
---
If your project is set up for it, you can reply to this email and have your
reply ap
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70306545
Thanks, Also a design decision, is it worthy enough to add this as an
option to `train` given that it is now within the "style limit"?
---
If your project is set up fo
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70305493
Also, as far as testingit's hard. One way might be to:
* Run RF with a random seed and subsampling rate 1.0
* Run it the same way, but with with rate < 1.0
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70305015
[Test build #25672 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25672/consoleFull)
for PR 4073 at commit
[`6685b44`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70305019
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70304959
Good point, yes, I think it's worth fixing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proj
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70304874
[Test build #25672 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25672/consoleFull)
for PR 4073 at commit
[`6685b44`](https://githu
GitHub user MechCoder reopened a pull request:
https://github.com/apache/spark/pull/4073
[SPARK-3726] [MLlib] Allow sampling_rate not equal to 1.0
I've added support for sampling_rate not equal to 1.0 . I have two major
questions.
1. A Scala style test is failing, since the
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70304290
Oh well, but still if I'm not mistaken, the `subSamplingRate` is overriden
by the condition `numTrees > 1`. This should not be the case as having a lower
sampling, mig
Github user MechCoder closed the pull request at:
https://github.com/apache/spark/pull/4073
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is e
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70303258
@MechCoder Taking a closer look, I now realize that part of this
functionality is already there...see the JIRA & let me know what you think.
---
If your project is se
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70302451
@jkbradley Oops, the comments got deleted somehow. I meant that this is
because there are 10 arguments in `trainClassifier` and `trainRegressor`
---
If your project is
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70300939
@jkbradley, the issue is that the function `train` has more than 10 args.
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70300152
You can run dev/scalastyle locally to see what the issues are.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70297536
I've made changes such that this not break anything, i.e everything is
backward compat.
---
If your project is set up for it, you can reply to this email and have your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70283762
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70283759
[Test build #25666 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25666/consoleFull)
for PR 4073 at commit
[`6685b44`](https://gith
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70283608
@jkbradley @mengxr it would be great if you could have a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4073#issuecomment-70283599
[Test build #25666 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25666/consoleFull)
for PR 4073 at commit
[`6685b44`](https://githu
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4073
[SPARK-3726] [MLlib] Allow sampling_rate not equal to 1.0
I've added support for sampling_rate not equal to 1.0 . I have two major
questions.
1. A Scala style test is failing, since the n
54 matches
Mail list logo