Do you mean "Gibbs sampling" ? Actually, foreach is an action, it will
collect all data from workers to driver. You will get OOM complained by JVM.

I am not very sure of your implementation, but if data not need to join
together, you'd better keep them in workers.


2014/1/24 guojc <guoj...@gmail.com>

> Hi,
>    I'm writing a paralell mcmc program that having a very large dataset in
> memory, and need to update the dataset in-memory and avoid creating
> additional copy. Should I choose a foreach operation on rdd to express the
> change? or I have to create a new rdd after each sampling process?
>
> Thanks,
> Jiacheng Guo
>



-- 
Best Regards
-----------------------------------
Xusen Yin    尹绪森
Beijing Key Laboratory of Intelligent Telecommunications Software and
Multimedia
Beijing University of Posts & Telecommunications
Intel Labs China
Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*

Reply via email to