I've allocated about 4g for the driver. For the count stage, I notice the Shuffle Write to be 13.9 GB.
On Sat, Apr 9, 2016 at 11:43 AM, Ndjido Ardo BAR <ndj...@gmail.com> wrote: > What's the size of your driver? > On Sat, 9 Apr 2016 at 20:33, Buntu Dev <buntu...@gmail.com> wrote: > >> Actually, df.show() works displaying 20 rows but df.count() is the one >> which is causing the driver to run out of memory. There are just 3 INT >> columns. >> >> Any idea what could be the reason? >> >> On Sat, Apr 9, 2016 at 10:47 AM, <ndj...@gmail.com> wrote: >> >>> You seem to have a lot of column :-) ! >>> df.count() displays the size of your data frame. >>> df.columns.size() the number of columns. >>> >>> Finally, I suggest you check the size of your drive and customize it >>> accordingly. >>> >>> Cheers, >>> >>> Ardo >>> >>> Sent from my iPhone >>> >>> > On 09 Apr 2016, at 19:37, bdev <buntu...@gmail.com> wrote: >>> > >>> > I keep running out of memory on the driver when I attempt to do >>> df.show(). >>> > Can anyone let me know how to estimate the size of the dataframe? >>> > >>> > Thanks! >>> > >>> > >>> > >>> > -- >>> > View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-estimate-the-size-of-dataframe-using-pyspark-tp26729.html >>> > Sent from the Apache Spark User List mailing list archive at >>> Nabble.com. >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> > For additional commands, e-mail: user-h...@spark.apache.org >>> > >>> >> >>