Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61069736
@jkbradley @codedeft I think I have implemented all the suggestions on the
PR except for 1) public API and 2) warning when using non SquaredError loss
functions. I wil
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61069532
[Test build #22534 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22534/consoleFull)
for PR 2607 at commit
[`0183cb9`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61069535
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61069431
[Test build #22534 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22534/consoleFull)
for PR 2607 at commit
[`0183cb9`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61068865
[Test build #22533 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22533/consoleFull)
for PR 2607 at commit
[`1c40c33`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61068006
[Test build #22532 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22532/consoleFull)
for PR 2607 at commit
[`e33ab61`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61067173
[Test build #22531 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22531/consoleFull)
for PR 2607 at commit
[`035a2ed`](https://githu
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61026909
Studying the trade-offs sounds great. I think it's OK if checkpointing is
added later as an option. Thanks!
---
If your project is set up for it, you can reply to th
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61024859
@jkbradley I originally used checkpointing instead of simply caching in
memory. There are trade-offs going with one versus the other. I will study what
@codedeft imple
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19572600
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foun
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19570610
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61000703
It's a good point about the sequential nature of boosting models being
important when doing approximate predictions (using only some of the weak
hypotheses); I could im
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19570259
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Fou
Github user codedeft commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19570062
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.La
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19569553
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.
Github user codedeft commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19569497
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Found
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19569087
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Fou
Github user codedeft commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19567808
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Found
Github user codedeft commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19567364
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.La
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19564926
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.L
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19564695
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foun
Github user codedeft commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19563689
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.La
Github user codedeft commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19563516
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Found
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60863725
By the way, checkpointing is not quite the right term; currently, the code
persists but does not checkpoint the RDDs. I hope that the logic which
@codedeft implemented
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19510078
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Fou
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19508317
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foun
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60837931
I think it's OK to leave classification support but make a note in the doc
for SquaredError that it is meant for Regression. What do you think?
---
If your project is
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60824790
@jkbradley I agree. This needs more testing since it's a non-standard
option.
---
If your project is set up for it, you can reply to this email and have your
reply ap
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60824862
@jkbradley Should we even support classification then?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60824501
@manishamde Thinking more about the losses, I'm really not sure if absolute
error and logistic loss will behave reasonably. Could we make those losses
private[tree] an
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19499795
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Fou
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19499682
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala
---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to t
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19499327
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19499284
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala
---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60821510
Great, that sounds reasonable. I believe we could do it eventually: since
the trees won't be too deep in many cases, the sufficient stats to pass around
might be manag
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60821283
@jkbradley Your understanding is correct. Sorry for not mentioning it
explicitly on the JIRA/PR earlier.
Yes, calculating median, etc. for terminal region pre
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19498943
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19498899
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala
---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to t
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60820308
@manishamde Thanks in advance for the API simplification!
Also, I'm realizing that this code should be correct for SquaredError but
might not be quite right fo
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19498319
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19497888
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foun
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19497886
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foun
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19497891
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foun
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19497871
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foun
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19497878
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foun
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496561
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496447
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala
---
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496266
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala
---
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496253
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostingSuite.scala ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Softwar
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496241
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Softwar
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496224
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Softwar
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496210
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala
---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496113
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
---
@@ -70,7 +71,8 @@ class Strategy (
val categoricalFea
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496095
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala ---
@@ -46,20 +47,63 @@ private[tree] object BaggedPoint {
* Conv
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496069
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Fou
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60814290
@jkbradley Your API suggestions sound reasonable. Let me work on
simplifying the API. I had originally started with something similar to what
you suggested so I will r
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495237
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala
---
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache S
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60812919
@manishamde Added comments based on a quick pass looking mainly at the
API. My main concern is the same as in my comment above about the verbosity of
(a) the many Gra
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495231
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala
---
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache S
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495249
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495252
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala
---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to t
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495258
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala ---
@@ -46,20 +47,63 @@ private[tree] object BaggedPoint {
* Conve
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495261
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foun
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495254
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
---
@@ -70,7 +71,8 @@ class Strategy (
val categoricalFeat
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495228
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495234
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala
---
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache S
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495244
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495219
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495241
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostingSuite.scala ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60696746
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60696743
[Test build #22313 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22313/consoleFull)
for PR 2607 at commit
[`49ba107`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60690948
[Test build #22313 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22313/consoleFull)
for PR 2607 at commit
[`49ba107`](https://githu
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60690690
@jkbradley I fixed the merge conflicts. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60686705
@manishamde I'll make a pass now; thanks for the updates! A patch
(SPARK-4022) was just merged which causes a few small conflicts. Could you
please fix those? Then
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60568462
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60568456
[Test build #22285 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22285/consoleFull)
for PR 2607 at commit
[`eff21fe`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60561667
[Test build #22285 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22285/consoleFull)
for PR 2607 at commit
[`eff21fe`](https://githu
77 matches
Mail list logo