[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65798/ Test PASSed. ---

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #65798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65798/consoleFull)** for PR 14359 at commit [`d16c2da`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-22 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/14359 LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, o

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #65798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65798/consoleFull)** for PR 14359 at commit [`d16c2da`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Thanks @hhbyyh and @sethah ! I agree that a later PR could be more careful about which trees are completed in which order and test this more thoroughly. But I hope this takes us 80% of t

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-20 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/14359 This is a really nice improvement. The communication overhead is reduced, based on some simple local tests. I wonder how we can add a test to verify that the algorithm focuses on completing whole tre

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-09-09 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/14359 Hi Joseph, Sorry for the late response. I was occupied by a customer Spark project for the past month. The idea looks reasonable and I tested with MNist dataset and the overall run time decr

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64020/ Test PASSed. ---

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #64020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64020/consoleFull)** for PR 14359 at commit [`133fdbf`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #64020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64020/consoleFull)** for PR 14359 at commit [`133fdbf`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Thanks @jodersky ! Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-18 Thread jodersky
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14359 Some comments still refer to the use of queue and should be updated. Other than that, the data structure part now looks good to me. --- If your project is set up for it, you can reply to this emai

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #63886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63886/consoleFull)** for PR 14359 at commit [`41f4297`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63886/ Test PASSed. ---

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #63886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63886/consoleFull)** for PR 14359 at commit [`41f4297`](https://github.com/apache/spark/commit/4

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the fe

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Ahh, you're right; I was looking at immutable. I'll update to use the mutable stack. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread jodersky
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14359 > I switched to Stack and then realized Stack has been deprecated in Scala 2.11... I think you probably read the *immutable* stack docs; the *mutable* stack is not deprecated AFAIK. I can

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63822/ Test PASSed. ---

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #63822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63822/consoleFull)** for PR 14359 at commit [`f79f77c`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Btw, to give back-of-the-envelope estimates, we can look at 2 numbers: (1) How many nodes will be split on each iteration? (2) How big is the forest which is serialized and sent to workers o

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #63822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63822/consoleFull)** for PR 14359 at commit [`f79f77c`](https://github.com/apache/spark/commit/f

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Sorry for the long delay; I've been swamped by other things for a while. Re-emerging... I switched to Stack and then realized Stack has been deprecated in Scala 2.11, so I reverted to th

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread jodersky
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14359 Agree, it's not very obvious. In the latter document I think a `push` is akin to `append` and `pop` to `head` --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Thanks @jodersky I saw those, but the first does not document computational cost & the latter does not really clarify what I need for stacks (push and pop). --- If your project is set up for i

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62900/ Test FAILed. ---

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #62900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62900/consoleFull)** for PR 14359 at commit [`3c00d03`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread jodersky
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14359 @jkbradley , you can find the scaladoc on stacks here http://www.scala-lang.org/api/current/index.html#scala.collection.mutable.Stack Also this document http://docs.scala-lang.org/overvie

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #62900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62900/consoleFull)** for PR 14359 at commit [`3c00d03`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Not urgent, but I'd like it to be in 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/14359 If it is not urgent, I'd like to try some large scale training to understand more about the improvements. --- If your project is set up for it, you can reply to this email and have your reply appea

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62858/ Test PASSed. ---

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #62858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62858/consoleFull)** for PR 14359 at commit [`6fcfb4b`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/14359 Ack. I'll review it and run tests tonight. Is it targeting 2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does no

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #62858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62858/consoleFull)** for PR 14359 at commit [`6fcfb4b`](https://github.com/apache/spark/commit/6

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 @hhbyyh This is an improvement I had implemented a while back, just a little too late for the 2.0 code freeze. Could you please help review it or find others? Thank you! --- If your project is