Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/2780
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59831875
Merged into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59826477
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/404/consoleFull)
for PR 2780 at commit
[`18d0301`](https://github.com/
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59816923
@chouqin Thanks for the update! LGTM once the tests pass.
@manishamde At some point, I hope the histogram functionality can be part
of mllib/statistics/ espe
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59816513
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/404/consoleFull)
for PR 2780 at commit
[`18d0301`](https://github.com/a
Github user chouqin commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59676055
@jkbradley I updated unit test to check splits returned by
`findSplitsForContinuousFeature` are distinct. I have run the unit test for it
and it passed.
@mengxr
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59664847
@jkbradley I read the paper by Sanku et al and other papers but they
required a custom implementation. The sort method has worked OK so far but I
was hoping somebody w
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59664449
@chouqin LGTM. :+1:
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59661579
@chouqin Thanks for the updates! The updates look good.
One more small comment: Could you please add explicit checks in the unit
tests to make sure the returne
Github user chouqin commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59637234
Jekins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have th
Github user chouqin commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59637054
@manishamde @jkbradley thanks for your comments, I have changed my code
now. Do you have any more suggestions?
---
If your project is set up for it, you can reply to thi
Github user chouqin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19057399
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Loggin
Github user chouqin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19057305
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Loggin
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59603338
@chouqin Thanks for this PR! This method should be a real improvement. I
added some small comments inline.
My main concern right now is testing edge cases, l
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052550
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Logg
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052553
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Logg
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052548
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -912,16 +913,18 @@ object DecisionTree extends Serializable with Loggin
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052546
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -912,16 +913,18 @@ object DecisionTree extends Serializable with Loggin
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052555
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Logg
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052556
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Logg
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052549
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Logg
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052547
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -912,16 +913,18 @@ object DecisionTree extends Serializable with Loggin
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052551
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Logg
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052554
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Logg
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r19052544
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -36,6 +36,7 @@ import org.apache.spark.mllib.tree.model._
import o
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59590801
@chouqin Sorry for the slow response!
About the RandomForestSuite failure: The change to fix the failure
(maxBins) is OK with me. It is a somewhat brittle tes
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59064525
@chouqin Good point. The sort operation is the bottleneck, not the value
count. It might be good to check numbers with a test run to verify whether the
performance hit
Github user chouqin commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-59007084
@manishamde thanks for your comments. I will adjust my code after #2785
gets merged.
As for performance, yes, this is slower than the current implementation,
bu
Github user manishamde commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58991243
@chouqin Thanks for the PR. It looks good to me in general.
If I understand correctly, we are histogramming (value counting) each
feature instead of just simpl
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r18809250
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Log
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r18809178
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -102,6 +102,37 @@ class DecisionTreeSuite extends FunSuite with
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r18809123
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Log
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r18808821
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Log
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r18808756
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with
Log
Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r18808651
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala ---
@@ -93,7 +93,8 @@ class RandomForestSuite extends FunSuite with
Loc
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58883501
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21686/consoleFull)
for PR 2780 at commit
[`d353596`](https://github.com/a
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58883509
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58876511
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21686/consoleFull)
for PR 2780 at commit
[`d353596`](https://github.com/ap
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58872495
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21685/consoleFull)
for PR 2780 at commit
[`9e64699`](https://github.com/a
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58872500
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58872046
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21684/consoleFull)
for PR 2780 at commit
[`9e64699`](https://github.com/a
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58872054
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21
Github user chouqin commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58865951
@jkbradley, RandomForestSuite fails because original splits are better fit
for the training data(for example, 899.5 is a split threshold, which is close
to 900.) I think
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58865777
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21685/consoleFull)
for PR 2780 at commit
[`9e64699`](https://github.com/ap
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58865274
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21684/consoleFull)
for PR 2780 at commit
[`9e64699`](https://github.com/ap
Github user chouqin commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58865080
Jekins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58857601
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58857589
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21680/consoleFull)
for PR 2780 at commit
[`092efcb`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58853271
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21680/consoleFull)
for PR 2780 at commit
[`092efcb`](https://github.com/ap
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2780#issuecomment-58853083
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
GitHub user chouqin opened a pull request:
https://github.com/apache/spark/pull/2780
[SPARK-3207][MLLIB]Choose splits for continuous features in DecisionTree
more adaptively
DecisionTree splits on continuous features by choosing an array of values
from a subsample of the data.
51 matches
Mail list logo