Re: Problems with large dataset using collect() and broadcast()

2014-12-24 Thread Patrick Wendell
Hi Will, When you call collect() the item you are collecting needs to fit in memory on the driver. Is it possible your driver program does not have enough memory? - Patrick On Wed, Dec 24, 2014 at 9:34 PM, Will Yang wrote: > Hi all, > In my occasion, I have a huge HashMap[(Int, Long), (Double,

Problems with large dataset using collect() and broadcast()

2014-12-24 Thread Will Yang
Hi all, In my occasion, I have a huge HashMap[(Int, Long), (Double, Double, Double)], say several GB to tens of GB, after each iteration, I need to collect() this HashMap and perform some calculation, and then broadcast() it to every node. Now I have 20GB for each executor and after it performances