Re: How to estimate the size of dataframe using pyspark?

2016-04-12 Thread Buntu Dev
Thanks Davies, I've shared the code snippet and the dataset. Please let me know if you need any other information. On Mon, Apr 11, 2016 at 10:44 AM, Davies Liu wrote: > That's weird, DataFrame.count() should not require lots of memory on > driver, could you provide a way

Re: How to estimate the size of dataframe using pyspark?

2016-04-11 Thread Davies Liu
That's weird, DataFrame.count() should not require lots of memory on driver, could you provide a way to reproduce it (could generate fake dataset)? On Sat, Apr 9, 2016 at 4:33 PM, Buntu Dev wrote: > I've allocated about 4g for the driver. For the count stage, I notice the >

Re: How to estimate the size of dataframe using pyspark?

2016-04-09 Thread Buntu Dev
I've allocated about 4g for the driver. For the count stage, I notice the Shuffle Write to be 13.9 GB. On Sat, Apr 9, 2016 at 11:43 AM, Ndjido Ardo BAR wrote: > What's the size of your driver? > On Sat, 9 Apr 2016 at 20:33, Buntu Dev wrote: > >> Actually,

Re: How to estimate the size of dataframe using pyspark?

2016-04-09 Thread bdev
Thanks Mandar, I couldn't see anything under the 'Storage Section' but under the Executors I noticed it to be 3.1 GB: Executors (1) Memory: 0.0 B Used (3.1 GB Total) -- View this message in context:

Re: How to estimate the size of dataframe using pyspark?

2016-04-09 Thread Ndjido Ardo BAR
What's the size of your driver? On Sat, 9 Apr 2016 at 20:33, Buntu Dev wrote: > Actually, df.show() works displaying 20 rows but df.count() is the one > which is causing the driver to run out of memory. There are just 3 INT > columns. > > Any idea what could be the reason? >

Re: How to estimate the size of dataframe using pyspark?

2016-04-09 Thread Buntu Dev
Actually, df.show() works displaying 20 rows but df.count() is the one which is causing the driver to run out of memory. There are just 3 INT columns. Any idea what could be the reason? On Sat, Apr 9, 2016 at 10:47 AM, wrote: > You seem to have a lot of column :-) ! >

How to estimate the size of dataframe using pyspark?

2016-04-09 Thread bdev
I keep running out of memory on the driver when I attempt to do df.show(). Can anyone let me know how to estimate the size of the dataframe? Thanks! -- View this message in context: