Hi,

Yes, I believe people do that. I also believe that SparkML is able to
figure out when to cache some internal RDD also. That's definitely true for
random forest algo. It doesn't harm to cache the same RDD twice, too.

But it's not clear what'd you want to know...

--
Be well!
Jean Morozov

On Sun, Apr 3, 2016 at 11:34 AM, Sergey <ser...@gmail.com> wrote:

> Hi Spark ML experts!
>
> Do you use RDDs caching somewhere together with ML lib to speed up
> calculation?
> I mean typical machine learning use cases.
> Train-test split, train, evaluate, apply model.
>
> Sergey.
>

Reply via email to