RE: [MLlib] Performance issues when building GBM models

2015-02-09 Thread Christopher Thom
angrui Meng [mailto:men...@gmail.com] Sent: Tuesday, 10 February 2015 7:07 AM To: Christopher Thom Cc: user@spark.apache.org Subject: Re: [MLlib] Performance issues when building GBM models Could you check the Spark UI and see whether there are RDDs being kicked out during the computation? We cache the

Re: [MLlib] Performance issues when building GBM models

2015-02-09 Thread Xiangrui Meng
Could you check the Spark UI and see whether there are RDDs being kicked out during the computation? We cache the residual RDD after each iteration. If we don't have enough memory/disk, it gets recomputed and results something like `t(n) = t(n-1) + const`. We might cache the features multiple times

[MLlib] Performance issues when building GBM models

2015-02-08 Thread Christopher Thom
Hi All, I wonder if anyone else has some experience building a Gradient Boosted Trees model using spark/mllib? I have noticed when building decent-size models that the process slows down over time. We observe that the time to build tree n is approximately a constant time longer than the time to