Can you paste the code? How much memory does your system have and how big is your dataset? Did you try df.persist(StorageLevel.MEMORY_AND_DISK)?
Thanks Best Regards On Fri, Jul 17, 2015 at 5:14 PM, Harit Vishwakarma < harit.vishwaka...@gmail.com> wrote: > Thanks, > Code is running on a single machine. > And it still doesn't answer my question. > > On Fri, Jul 17, 2015 at 4:52 PM, ayan guha <guha.a...@gmail.com> wrote: > >> You can bump up number of partitions while creating the rdd you are using >> for df >> On 17 Jul 2015 21:03, "Harit Vishwakarma" <harit.vishwaka...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I used createDataFrame API of SqlContext in python. and getting >>> OutOfMemoryException. I am wondering if it is creating whole dataFrame in >>> memory? >>> I did not find any documentation describing memory usage of Spark APIs. >>> Documentation given is nice but little more details (specially on memory >>> usage/ data distribution etc.) will really help. >>> >>> -- >>> Regards >>> Harit Vishwakarma >>> >>> > > > -- > Regards > Harit Vishwakarma > >