Thanks Davies, I've shared the code snippet and the dataset. Please let me
know if you need any other information.

On Mon, Apr 11, 2016 at 10:44 AM, Davies Liu <dav...@databricks.com> wrote:

> That's weird, DataFrame.count() should not require lots of memory on
> driver, could you provide a way to reproduce it (could generate fake
> dataset)?
>
> On Sat, Apr 9, 2016 at 4:33 PM, Buntu Dev <buntu...@gmail.com> wrote:
> > I've allocated about 4g for the driver. For the count stage, I notice the
> > Shuffle Write to be 13.9 GB.
> >
> > On Sat, Apr 9, 2016 at 11:43 AM, Ndjido Ardo BAR <ndj...@gmail.com>
> wrote:
> >>
> >> What's the size of your driver?
> >> On Sat, 9 Apr 2016 at 20:33, Buntu Dev <buntu...@gmail.com> wrote:
> >>>
> >>> Actually, df.show() works displaying 20 rows but df.count() is the one
> >>> which is causing the driver to run out of memory. There are just 3 INT
> >>> columns.
> >>>
> >>> Any idea what could be the reason?
> >>>
> >>> On Sat, Apr 9, 2016 at 10:47 AM, <ndj...@gmail.com> wrote:
> >>>>
> >>>> You seem to have a lot of column :-) !
> >>>> df.count() displays the size of your data frame.
> >>>> df.columns.size() the number of columns.
> >>>>
> >>>> Finally, I suggest you check the size of your drive and customize it
> >>>> accordingly.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Ardo
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>> > On 09 Apr 2016, at 19:37, bdev <buntu...@gmail.com> wrote:
> >>>> >
> >>>> > I keep running out of memory on the driver when I attempt to do
> >>>> > df.show().
> >>>> > Can anyone let me know how to estimate the size of the dataframe?
> >>>> >
> >>>> > Thanks!
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > View this message in context:
> >>>> >
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-estimate-the-size-of-dataframe-using-pyspark-tp26729.html
> >>>> > Sent from the Apache Spark User List mailing list archive at
> >>>> > Nabble.com.
> >>>> >
> >>>> >
> ---------------------------------------------------------------------
> >>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >>>> > For additional commands, e-mail: user-h...@spark.apache.org
> >>>> >
> >>>
> >>>
> >
>

Reply via email to