Hi,
I've experienced a similar problem upgrading from spark 1.4 to spark 1.6.
The data is not evenly distributed across executors, but in my case it also
reproduced with legacy mode.
Also tried 1.6.1 rc-1, with same results.

Still looking for resolution.

Lior

On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers <ko...@tresata.com> wrote:

> looking at the cached rdd i see a similar story:
> with useLegacyMode = true the cached rdd is spread out across 10
> executors, but with useLegacyMode = false the data for the cached rdd sits
> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
> RDD that got partitioned (hash partitioner, 50 partitions) before being
> cached.
>
> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> hello all,
>> we are just testing a semi-realtime application (it should return results
>> in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it
>> used to run on spark 1.5.1
>>
>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>> spark.memory.useLegacyMode = true, however if i switch to
>> spark.memory.useLegacyMode = false the queries take about 50% to 100% more
>> time.
>>
>> the issue becomes clear when i focus on a single stage: the individual
>> tasks are not slower at all, but they run on less executors.
>> in my test query i have 50 tasks and 10 executors. both with
>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3
>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>> false the tasks run on just 3 executors out of 10, while with useLegacyMode
>> = true they spread out across 10 executors. all the tasks running on just a
>> few executors leads to the slower results.
>>
>> any idea why this would happen?
>> thanks! koert
>>
>>
>>
>

Reply via email to