I'm using pySpark.I've list of 1 million items (all float values ) and 1 million users. for each user I want to sample randomly some items from the item list.Broadcasting the item list results in Outofmemory error on the driver, tried setting driver memory till 10G. I tried to persist this array on disk but I'm not able to figure out a way to read the same on the workers. Any suggestion would be appreciated.
- Broadcasting huge array or persisting on HDFS to read on e... surender kumar
- Re: Broadcasting huge array or persisting on HDFS to ... Matteo Cossu
- Re: Broadcasting huge array or persisting on HDFS... surender kumar
- Re: Broadcasting huge array or persisting on ... Matteo Cossu
- Re: Broadcasting huge array or persisting... surender kumar
- Re: Broadcasting huge array or persi... Gourav Sengupta
- Re: Broadcasting huge array or p... surender kumar