Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48994002
@manishamde - can you add `[MLlib]` to the title of this pull request?
Otherwise it doesn't get filtered properly by our filters.
---
If your project is set up for it,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48998279
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48958683
QA tests have started for PR 886. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16635/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48958795
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48960458
QA tests have started for PR 886. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16636/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48960547
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48961699
QA tests have started for PR 886. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16637/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48961781
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48962289
QA tests have started for PR 886. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16638/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48971082
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48972522
QA tests have started for PR 886. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16645/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48978578
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48992765
QA tests have started for PR 886. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16661/consoleFull
---
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r14865144
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -768,104 +973,157 @@ object DecisionTree extends Serializable with
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48867199
Thanks Evan. I have compared to scikit-learn on the covertype dataset and
the results looked similar.
---
If your project is set up for it, you can reply to this
Github user etrain commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r14836561
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -768,104 +973,157 @@ object DecisionTree extends Serializable with
Github user etrain commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48767401
I've gone through this in some depth, and aside from a couple of minor
style nits - the logic looks good to me. Manish - have you compared output vs.
scikit-learn for
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48674298
QA tests have started for PR 886. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16530/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48674369
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brcase class
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48674374
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16530/
---
If your project is set up for it, you can
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48675447
QA tests have started for PR 886. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16531/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48675496
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brcase class
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48675499
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16531/
---
If your project is set up for it, you can
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48683128
QA results for PR 886:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48412354
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48412365
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48412478
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48412480
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16428/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48413445
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48413437
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48413580
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16430/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48413579
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48415119
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48415107
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48415267
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16432/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48415266
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48143660
@etrain Added implicit conversion. :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48143754
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48143761
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48143827
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16362/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-48143826
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13982468
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -49,6 +49,7 @@ object DecisionTreeRunner {
case
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13982568
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -45,7 +46,7 @@ class DecisionTree (private val strategy: Strategy)
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13982597
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -233,13 +234,73 @@ object DecisionTree extends Serializable with
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-46593827
Thanks @etrain
1. I will try to use implicits
2. I agree. We originally had separate trees and then merged them for
readability. There is a sweet spot in
Github user etrain commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13982852
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -49,6 +49,7 @@ object DecisionTreeRunner {
case class
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13983131
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -49,6 +49,7 @@ object DecisionTreeRunner {
case
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13996228
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -49,6 +49,7 @@ object DecisionTreeRunner {
case
Github user etrain commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13926351
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -49,6 +49,7 @@ object DecisionTreeRunner {
case class
Github user etrain commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13926460
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -45,7 +46,7 @@ class DecisionTree (private val strategy: Strategy)
Github user etrain commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13926555
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -212,7 +211,9 @@ object DecisionTree extends Serializable with Logging {
Github user etrain commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13926606
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -233,13 +234,73 @@ object DecisionTree extends Serializable with Logging
Github user etrain commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-46465872
I've taken a first pass at this and at a high level it looks good.
The main two things I'd say are
1) I think an implicit that converts LabeledPoint to
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-45959540
Friendly nudge: could somebody please take a look at this PR. It is
blocking upcoming ensemble tree PRs.
---
If your project is set up for it, you can reply to this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-45127038
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-45127063
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-45127229
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-45139033
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-45139221
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-45139223
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15453/
---
If your project is set up for it, you can
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-45141375
I added support for sorting categorical feature values using impurity
(gini/entropy) calculated over the corresponding labels in multiclass
classification. This
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44381347
I don't have a reference, and I did look for one. I am sure it is not
optimal, and not even that great as a greedy algorithm. Two low-entropy
distributions over target
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44388751
I fully agree.
I will give others a day or two to raise any concerns if they have any and
then proceed to implement the two-step solution for multiclass
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44243946
@manishamde Yes for categorical features with high cardinality, you don't
want to consider all possible splits. I don't think having a cardinality of 30
or 40 is that
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44359841
@srowen It's good to know about the use-case for cardinality in the order
of tens.
The categorical feature ordering using the average value of the target
Github user etrain commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44360446
I am worried that exponential growth in the number of split possibilities
kills us when we check for all splits when we get to even 20-30
categorical values. That's
GitHub user manishamde opened a pull request:
https://github.com/apache/spark/pull/886
SPARK-1536: multiclass classification support for decision tree
The ability to perform multiclass classification is a big advantage for
using decision trees and was a highly requested feature for
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44228078
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44228087
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44228145
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44228146
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15215/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44228654
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44228663
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44228716
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15216/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-44228715
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
75 matches
Mail list logo