angrui Meng [mailto:men...@gmail.com]
Sent: Tuesday, 10 February 2015 7:07 AM
To: Christopher Thom
Cc: user@spark.apache.org
Subject: Re: [MLlib] Performance issues when building GBM models
Could you check the Spark UI and see whether there are RDDs being kicked out
during the computation? We cache the
Could you check the Spark UI and see whether there are RDDs being
kicked out during the computation? We cache the residual RDD after
each iteration. If we don't have enough memory/disk, it gets
recomputed and results something like `t(n) = t(n-1) + const`. We
might cache the features multiple times
Hi All,
I wonder if anyone else has some experience building a Gradient Boosted Trees
model using spark/mllib? I have noticed when building decent-size models that
the process slows down over time. We observe that the time to build tree n is
approximately a constant time longer than the time to