I suspect its the numpy filling up Memory.

Thanks
Best Regards

On Fri, Jul 17, 2015 at 5:46 PM, Harit Vishwakarma <
harit.vishwaka...@gmail.com> wrote:

> 1. load 3 matrices of size ~ 10000 X 10000 using numpy.
> 2. rdd2 = rdd1.values().flatMap( fun )  # rdd1 has roughly 10^7 tuples
> 3. df = sqlCtx.createDataFrame(rdd2)
> 4. df.save() # in parquet format
>
> It throws exception in createDataFrame() call. I don't know what exactly
> it is creating ? everything in memory? or can I make it to persist
> simultaneously while getting created.
>
> Thanks
>
>
> On Fri, Jul 17, 2015 at 5:16 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> Can you paste the code? How much memory does your system have and how big
>> is your dataset? Did you try df.persist(StorageLevel.MEMORY_AND_DISK)?
>>
>> Thanks
>> Best Regards
>>
>> On Fri, Jul 17, 2015 at 5:14 PM, Harit Vishwakarma <
>> harit.vishwaka...@gmail.com> wrote:
>>
>>> Thanks,
>>> Code is running on a single machine.
>>> And it still doesn't answer my question.
>>>
>>> On Fri, Jul 17, 2015 at 4:52 PM, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>>> You can bump up number of partitions while creating the rdd you are
>>>> using for df
>>>> On 17 Jul 2015 21:03, "Harit Vishwakarma" <harit.vishwaka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I used createDataFrame API of SqlContext in python. and getting
>>>>> OutOfMemoryException. I am wondering if it is creating whole dataFrame in
>>>>> memory?
>>>>> I did not find any documentation describing memory usage of Spark APIs.
>>>>> Documentation given is nice but little more details (specially on
>>>>> memory usage/ data distribution etc.) will really help.
>>>>>
>>>>> --
>>>>> Regards
>>>>> Harit Vishwakarma
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Regards
>>> Harit Vishwakarma
>>>
>>>
>>
>
>
> --
> Regards
> Harit Vishwakarma
>
>

Reply via email to