Could you check the Spark UI and see whether there are RDDs being
kicked out during the computation? We cache the residual RDD after
each iteration. If we don't have enough memory/disk, it gets
recomputed and results something like `t(n) = t(n-1) + const`. We
might cache the features multiple
Hi All,
I wonder if anyone else has some experience building a Gradient Boosted Trees
model using spark/mllib? I have noticed when building decent-size models that
the process slows down over time. We observe that the time to build tree n is
approximately a constant time longer than the time