Hi Mohit,

The log line about the ExternalAppendOnlyMap is more of a symptom of
slowness than causing slowness itself.  The ExternalAppendOnlyMap is used
when a shuffle is causing too much data to be held in memory.  Rather than
OOM'ing, Spark writes the data out to disk in a sorted order and reads it
back from disk later on when it's needed.  That's the job of the
ExternalAppendOnlyMap.

I wouldn't normally expect a conversion from Date to a Joda DateTime to
take significantly more memory.  But since you're using Kryo and classes
should be registered with it, may may have forgotten to register DateTime
with Kryo.  If you don't register a class, it writes the class name at the
beginning of every serialized instance, which for DateTime objects of size
roughly 1 long, that's a ton of extra space and very inefficient.

Can you confirm that DateTime is registered with Kryo?

http://spark.apache.org/docs/latest/tuning.html#data-serialization


On Wed, May 21, 2014 at 2:35 PM, Mohit Jaggi <mohitja...@gmail.com> wrote:

> Hi,
>
> I changed my application to use Joda time instead of java.util.Date and I
> started getting this:
>
> WARN ExternalAppendOnlyMap: Spilling in-memory map of 484 MB to disk (1
> time so far)
>
> What does this mean? How can I fix this? Due to this a small job takes
> forever.
>
> Mohit.
>
>
> P.S.: I am using kyro serialization, have played around with several
> values of sparkRddMemFraction
>

Reply via email to