[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10231 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-20 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198997927 LGTM Thanks @sethah for the PR and @NathanHowell for reviewing! Merging with master --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-197595825 **[Test build #2645 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2645/consoleFull)** for PR 10231 at commit [`c34075b`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198519802 **[Test build #53555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53555/consoleFull)** for PR 10231 at commit [`d8a4c77`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56441090 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198504543 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56707558 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198502197 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198501179 **[Test build #53552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53552/consoleFull)** for PR 10231 at commit [`a847bc9`](https://gi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198504516 **[Test build #53553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53553/consoleFull)** for PR 10231 at commit [`af3559a`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56441084 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198519918 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56440922 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-197620380 No specific dataset size. I was thinking of something in this ballpark: * 1K-10K rows * 10-100 columns * maxDepth 1 - 2 (shallow tree to avoid amortizing c

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198493930 That does not seem that bad. I'd say we should go ahead with your PR. If we want to optimize for small data, we can add a local implementation at some point. ---

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198503860 **[Test build #53553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53553/consoleFull)** for PR 10231 at commit [`af3559a`](https://gi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56441087 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198519915 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198502199 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-197624835 Could you also make this change: [https://github.com/apache/spark/pull/8246/files#diff-8ad842a043888473bb2b527e818de04bR645] Done with pass. I added a few mi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-197609041 **[Test build #2645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2645/consoleFull)** for PR 10231 at commit [`c34075b`](https://

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198504535 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-197596033 Would you have time to test this on a small dataset? The original PR confirmed it's faster for a larger dataset, but I'm curious if it affects timing (adversely) on

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198482885 @jkbradley I ran some local timings comparing before/after this change. I used `RandomForestRegressor` with all continuous features. It looks like there is a small perfo

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198506882 **[Test build #53555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53555/consoleFull)** for PR 10231 at commit [`d8a4c77`](https://gi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-197597688 I can set something up. Do you have a specific dataset size in mind or even a specific dataset? --- If your project is set up for it, you can reply to this email and ha

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198502178 **[Test build #53552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53552/consoleFull)** for PR 10231 at commit [`a847bc9`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198630445 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198630446 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198630414 **[Test build #53595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53595/consoleFull)** for PR 10231 at commit [`8f5077f`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198615653 **[Test build #53591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53591/consoleFull)** for PR 10231 at commit [`c9bec20`](https://gi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56739542 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198626508 **[Test build #53595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53595/consoleFull)** for PR 10231 at commit [`8f5077f`](https://gi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56743243 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -956,7 +956,7 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198622658 **[Test build #53591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53591/consoleFull)** for PR 10231 at commit [`c9bec20`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56742884 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -956,7 +956,7 @@ private[ml] object RandomForest extends Logging

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198622871 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198622874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-18 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56742175 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-01-06 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-169395890 @NathanHowell Thank you for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-01-06 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-169393380 @sethah looks good to me. :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-01-04 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-168845446 @NathanHowell do you think you'll have any time to take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r47382165 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163998876 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163998877 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163998699 **[Test build #47583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47583/consoleFull)** for PR 10231 at commit [`c34075b`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163987356 **[Test build #47583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47583/consoleFull)** for PR 10231 at commit [`c34075b`](https://gi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-10 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r47274795 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163437528 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163437525 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163437393 **[Test build #47453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47453/consoleFull)** for PR 10231 at commit [`6c4ba6f`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163433358 Ah great - if were killing the old code soon then no worries on the temporary duplication. --- If your project is set up for it, you can reply to this email and have y

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163430383 This JIRA was actually created as a blocker JIRA for [SPARK-12183](https://issues.apache.org/jira/browse/SPARK-12183) which is for removing the MLlib code entirely and w

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163427753 At first glance this seems to share a lot of code with the original implementation in MLLib (they both even work with RDDs of LabeledPoints) - maybe we could move much

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163427555 **[Test build #47453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47453/consoleFull)** for PR 10231 at commit [`6c4ba6f`](https://gi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r47165377 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r47164981 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging {

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163423629 **[Test build #47451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47451/consoleFull)** for PR 10231 at commit [`8f06b34`](https://g

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163423642 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163423636 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163423162 **[Test build #47451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47451/consoleFull)** for PR 10231 at commit [`8f06b34`](https://gi

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163422959 Yeah I can take a look tonight or tomorrow On Dec 9, 2015 14:25, "Seth Hendrickson" wrote: > @NathanHowell would you

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread sethah
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-163419168 @NathanHowell would you be able to review this? cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-09 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/10231 [SPARK-12182][ML] Distributed binning for trees in spark.ml This PR changes the `findSplits` method in spark.ml to perform split calculations on the workers. This PR is meant to copy [PR-8246](http