Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/8246#issuecomment-139331864
@NathanHowell @jkbradley We should consider making bins per feature and
sample sizes configurable to avoid the side-effects mentioned above.
Did the
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/8246#discussion_r38249912
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1056,6 +988,70 @@ object DecisionTree extends Serializable with
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/8246#discussion_r38249715
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1056,6 +988,70 @@ object DecisionTree extends Serializable with
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/8246#issuecomment-135866073
Thanks @NathanHowell Sorry for not responding earlier. Will try to review
soon.
---
If your project is set up for it, you can reply to this email and have your
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/7380#issuecomment-131967469
@mengxr Sorry, did not get a chance to review this so far. Will try to do
it today.
---
If your project is set up for it, you can reply to this email and have your
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/7380#issuecomment-122480109
@mengxr Sure. I can take a look.
How do we handle API modifications? Any change moves the tag to the newest
version?
---
If your project is set up for it
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/7294#issuecomment-121840575
@jkbradley It looks good to me. It might be a good idea to run the
spark.mllib and spark.ml models on a couple of datasets to ensure there are no
regressions (in
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/5009#issuecomment-87079012
@jkbradley Apologies for not reviewing earlier. I hope to make one pass
over the weekend.
I have one quick question -- what's the rationale for abbrevi
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-74733642
Thanks @MechCoder @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/4231#issuecomment-74722368
@MechCoder Sorry I didn't see the message earlier. I am sure @jkbradley
must have done a thorough review but please let me know if you need me to take
a
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3461#issuecomment-65167028
@jkbradley The GBDT sections looks good to me but the subsection on
Comparison with RFs could possibly be moved towards the end. It breaks the flow
in my opinion
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21133672
--- Diff: docs/mllib-decision-tree.md ---
@@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are
considered.
### Stopping rule
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21068173
--- Diff: docs/mllib-gbt.md ---
@@ -0,0 +1,308 @@
+---
+layout: global
+title: Gradient-Boosted Trees - MLlib
+displayTitle: MLlib
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21068117
--- Diff: docs/mllib-gbt.md ---
@@ -0,0 +1,308 @@
+---
+layout: global
+title: Gradient-Boosted Trees - MLlib
+displayTitle: MLlib
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21068084
--- Diff: docs/mllib-gbt.md ---
@@ -0,0 +1,308 @@
+---
+layout: global
+title: Gradient-Boosted Trees - MLlib
+displayTitle: MLlib
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21067979
--- Diff: docs/mllib-decision-tree.md ---
@@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are
considered.
### Stopping rule
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21067860
--- Diff: docs/mllib-decision-tree.md ---
@@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are
considered.
### Stopping rule
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21067826
--- Diff: docs/mllib-decision-tree.md ---
@@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are
considered.
### Stopping rule
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21067657
--- Diff: docs/mllib-decision-tree.md ---
@@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are
considered.
### Stopping rule
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21067802
--- Diff: docs/mllib-decision-tree.md ---
@@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are
considered.
### Stopping rule
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3461#discussion_r21067775
--- Diff: docs/mllib-decision-tree.md ---
@@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are
considered.
### Stopping rule
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3439#issuecomment-64498949
@jkbradley LGTM. Thanks for the documentation too -- it is really helpful.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3439#issuecomment-64497207
@jkbradley I am trying to find my reference for the LogLoss calculations.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3320#issuecomment-63900346
Thanks a lot @davies
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20629031
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -40,151 +39,98 @@ import
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20628996
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -45,146 +43,92 @@ import
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3374#issuecomment-63763093
Completed my pass. LGTM! :+1:
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20624623
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala
---
@@ -23,104 +23,95 @@ import
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20623750
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala
---
@@ -23,104 +23,95 @@ import
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20623463
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/TreeEnsembleModel.scala
---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20622816
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/TreeEnsembleModel.scala
---
@@ -0,0 +1,182 @@
+/*
--- End diff
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20622629
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -45,146 +43,92 @@ import
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20622307
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -45,146 +43,92 @@ import
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3374#issuecomment-63746922
Should the```trainClassifier``` and ``trainRegressor`` methods from
```DecisionTree``` and ```RandomForest``` classes also be the deprecated?
---
If your project is
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3374#issuecomment-63744889
@mengxr The plan to move to mllib.ensemble namespace with a new class
sounds good to me.
---
If your project is set up for it, you can reply to this email and have
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3374#discussion_r20621257
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -45,146 +43,92 @@ import
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3374#issuecomment-63744101
Will we have to rename ```GradientBoostedTrees``` back to
```GradientBoosting``` when we add generic weak learner support? I think we
should not modify the name of
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3320#discussion_r20619737
--- Diff: python/pyspark/mllib/tree.py ---
@@ -181,8 +180,191 @@ def trainRegressor(data, categoricalFeaturesInfo,
>>> model.pr
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3320#discussion_r20618922
--- Diff: python/pyspark/mllib/tree.py ---
@@ -181,8 +180,191 @@ def trainRegressor(data, categoricalFeaturesInfo,
>>> model.pr
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3320#discussion_r20481177
--- Diff: python/pyspark/mllib/tree.py ---
@@ -181,8 +180,191 @@ def trainRegressor(data, categoricalFeaturesInfo,
>>> model.pr
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3320#discussion_r20479768
--- Diff: python/pyspark/mllib/tree.py ---
@@ -181,8 +180,191 @@ def trainRegressor(data, categoricalFeaturesInfo,
>>> model.pr
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-63375573
I found this reference recently about Netflix's distributed implementation
of neural nets that could be relevant for MLlib.
http://techblog.netflix.com/20
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-63375342
@avulanov Thanks for conducting the experiments. Could you plot graphs for
the experiments that you conducted with changing number of features and number
of machines
GitHub user manishamde opened a pull request:
https://github.com/apache/spark/pull/3214
[MLLIB] SPARK-4347: Reducing GradientBoostingSuite run time.Before: [info]
GradientBoostingSuite: [info] - Regression with continuous features:
SquaredError (22 seconds, 115 milliseconds) [info
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/1290#discussion_r19922521
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: MLlib
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3099#issuecomment-61916011
I have a few comments based upon the API:
1. Like @jkbradley, I prefer ```lr.setMaxIter(50)``` over
```lr.set(lr.maxIter, 50)```. Also, prefer to avoid
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3094#issuecomment-61849672
@codedeft Thanks for creating the JIRA and informing us.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3094#issuecomment-61756916
@codedeft Not yet. I was planning to but forgot to do so. Feel free to
create one or I can create it if you prefer.
You are correct. We need to add (possibly
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3094#issuecomment-61721420
@jkbradley Thanks! I will take a look and get back.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3084#issuecomment-61577226
@mengxr Sorry, it's my fault. It looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2435#issuecomment-61549273
@0asa Thanks. Looks good. Let's move the conversation to the JIRA.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2435#issuecomment-61494531
@0asa Yes. PRs for these will be great.
Could you check if there are already existing JIRA for these -- if not, you
could create a JIRA tickets. Also, please
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/1290#discussion_r19720701
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: MLlib
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/1290#discussion_r19717985
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/1290#discussion_r19717937
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-61436212
@bgreeven Another general suggestion: consider adding logging to the code.
It goes a long way in debugging errors and get statuses on long running job.
Check
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/1290#discussion_r19717692
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/1290#discussion_r19717699
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/1290#discussion_r19717451
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: MLlib
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-61433892
@bgreeven I haven't studied the implementation details yet but I had a
question about the API. I realize that RDD[(Vector, Vector)] is a more general
structur
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/1290#discussion_r19717255
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2868#issuecomment-61375098
@codeleft I am so sorry.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-61361549
@tgaloppo Thanks for the PR and congratulations on the first contribution.
Apologies for the lack of feedback thus far -- I guess everyone is busy with
the 1.2
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2868#issuecomment-61359008
@codeleft I agree that local training should be a high priority. Just
curious -- what's the depth of the tree in the failing case?
I vote for merging th
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3000#issuecomment-61351601
LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61345885
@mengxr Could we get this merged? :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3000#issuecomment-61335029
Cool. I will make another pass shortly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61327135
Thanks. Sounds good to me.
I tried to use the builder pattern to help for the Java use case but I
guess we can handle it separately.
---
If your project
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61221341
@jkbradley I cleaned up the public API based on our discussion. Going with
a nested structure where we have to specify the weak learner parameters
separately is
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3000#discussion_r19645862
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19645493
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,412 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19639714
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,412 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19638835
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/EnsembleTestHelper.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19638832
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala
---
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19638821
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala
---
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61174268
@jkbradley Thanks for the confirmation! I will now proceed to finish the
rest of the tasks -- should be straightforward.
---
If your project is set up for it, you
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3000#discussion_r19636313
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3000#issuecomment-61168991
Agree.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/3000#issuecomment-61166762
How about the transformation for labels? This will help with
transformations for classification especially from +1/-1 to 0/1 labeling for
binary classification
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3000#discussion_r19635024
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3000#discussion_r19634947
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3000#discussion_r19634083
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/3000#discussion_r19633564
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2868#issuecomment-61156969
@codedeft @jkbradley I have not followed the discussion very closely
(apologies!) but at the high level, could we add local training support along
with this PR
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61155798
@jkbradley I agree with protection against driver failure for long
sequential operations. However, in this case we will just be checkpointing
partial models rather
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61069736
@jkbradley @codedeft I think I have implemented all the suggestions on the
PR except for 1) public API and 2) warning when using non SquaredError loss
functions. I
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-61024859
@jkbradley I originally used checkpointing instead of simply caching in
memory. There are trade-offs going with one versus the other. I will study what
@codedeft
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19570610
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -26,7 +26,7 @@ import
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19570259
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19569553
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -26,7 +26,7 @@ import
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19569087
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19510078
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60824790
@jkbradley I agree. This needs more testing since it's a non-standard
option.
---
If your project is set up for it, you can reply to this email and have your
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60824862
@jkbradley Should we even support classification then?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19499795
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19499327
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19499284
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala
---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-60821283
@jkbradley Your understanding is correct. Sorry for not mentioning it
explicitly on the JIRA/PR earlier.
Yes, calculating median, etc. for terminal region
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496561
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19496447
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala
---
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache
1 - 100 of 302 matches
Mail list logo