Do you have a machine with terabytes of RAM? afaik collect() requires RAM - so that would be your limiting factor.
2018-04-28 8:41 GMT-07:00 klrmowse <klrmo...@gmail.com>: > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use .collect() > > but, for now, it is going to have to use .collect() > > what is the size limit (memory for the driver) of RDD file that .collect() > can work with? > > i've been scouring google-search - S.O., blogs, etc, and everyone is > cautioning about .collect(), but does not specify how huge is huge... are > we > talking about a few gigabytes? terabytes?? petabytes??? > > > > thank you > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >