Do you mean "Gibbs sampling" ? Actually, foreach is an action, it will collect all data from workers to driver. You will get OOM complained by JVM.
I am not very sure of your implementation, but if data not need to join together, you'd better keep them in workers. 2014/1/24 guojc <guoj...@gmail.com> > Hi, > I'm writing a paralell mcmc program that having a very large dataset in > memory, and need to update the dataset in-memory and avoid creating > additional copy. Should I choose a foreach operation on rdd to express the > change? or I have to create a new rdd after each sampling process? > > Thanks, > Jiacheng Guo > -- Best Regards ----------------------------------- Xusen Yin 尹绪森 Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia Beijing University of Posts & Telecommunications Intel Labs China Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*