Re: spark 1.6 new memory management - some issues with tasks not using all executors

Lior Chaga Thu, 03 Mar 2016 03:55:38 -0800

No reference. I opened a ticket about missing documentation for it, and was
answered by Sean Owen that this is not meant for spark users. I explained
that it's an issue, but no news so far.


As for the memory management, I'm not experienced with it, but I suggest
you read: http://0x0fff.com/spark-memory-management/ and
http://0x0fff.com/spark-architecture/
Could be that the effective default storage memory in spark 1.6 is a bit
lower than in spark 1.5, and your application can't borrow from the
execution memory.



On Thu, Mar 3, 2016 at 2:35 AM, Koert Kuipers <ko...@tresata.com> wrote:

> with the locality issue resolved, i am still struggling with the new
> memory management.
>
> i am seeing tasks on tiny amounts of data take 15 seconds, of which 14 are
> spend in GC. with the legacy memory management (spark.memory.useLegacyMode
> = false ) they complete in 1 - 2 seconds.
>
> since we are permanently caching a very large number of RDDs, my suspicion
> is that with the new memory management these cached RDDs happily gobble up
> all the memory, and need to be evicted to run my small job, leading to the
> slowness.
>
> i can revert to legacy memory management mode, so this is not an issue,
> but i am worried that at some point the legacy memory management will be
> deprecated and then i am stuck with this performance issue.
>
> On Mon, Feb 29, 2016 at 12:47 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks
>>
>>
>> is there any reference to the benefits of setting reduceLocality to true?
>> i am tempted to disable it across the board.
>>
>> On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang <yy201...@gmail.com> wrote:
>>
>>> The default value for spark.shuffle.reduceLocality.enabled is true.
>>>
>>> To reduce surprise to users of 1.5 and earlier releases, should the
>>> default value be set to false ?
>>>
>>> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga <lio...@taboola.com> wrote:
>>>
>>>> Hi Koret,
>>>> Try spark.shuffle.reduceLocality.enabled=false
>>>> This is an undocumented configuration.
>>>> See:
>>>> https://github.com/apache/spark/pull/8280
>>>> https://issues.apache.org/jira/browse/SPARK-10567
>>>>
>>>> It solved the problem for me (both with and without memory legacy mode)
>>>>
>>>>
>>>> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> i find it particularly confusing that a new memory management module
>>>>> would change the locations. its not like the hash partitioner got 
>>>>> replaced.
>>>>> i can switch back and forth between legacy and "new" memory management and
>>>>> see the distribution change... fully reproducible
>>>>>
>>>>> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga <lio...@taboola.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I've experienced a similar problem upgrading from spark 1.4 to spark
>>>>>> 1.6.
>>>>>> The data is not evenly distributed across executors, but in my case
>>>>>> it also reproduced with legacy mode.
>>>>>> Also tried 1.6.1 rc-1, with same results.
>>>>>>
>>>>>> Still looking for resolution.
>>>>>>
>>>>>> Lior
>>>>>>
>>>>>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> looking at the cached rdd i see a similar story:
>>>>>>> with useLegacyMode = true the cached rdd is spread out across 10
>>>>>>> executors, but with useLegacyMode = false the data for the cached rdd 
>>>>>>> sits
>>>>>>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>>>>>>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>>>>>>> cached.
>>>>>>>
>>>>>>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> hello all,
>>>>>>>> we are just testing a semi-realtime application (it should return
>>>>>>>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. 
>>>>>>>> before
>>>>>>>> this it used to run on spark 1.5.1
>>>>>>>>
>>>>>>>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>>>>>>>> spark.memory.useLegacyMode = true, however if i switch to
>>>>>>>> spark.memory.useLegacyMode = false the queries take about 50% to 100% 
>>>>>>>> more
>>>>>>>> time.
>>>>>>>>
>>>>>>>> the issue becomes clear when i focus on a single stage: the
>>>>>>>> individual tasks are not slower at all, but they run on less executors.
>>>>>>>> in my test query i have 50 tasks and 10 executors. both with
>>>>>>>> useLegacyMode = true and useLegacyMode = false the tasks finish in 
>>>>>>>> about 3
>>>>>>>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode 
>>>>>>>> =
>>>>>>>> false the tasks run on just 3 executors out of 10, while with 
>>>>>>>> useLegacyMode
>>>>>>>> = true they spread out across 10 executors. all the tasks running on 
>>>>>>>> just a
>>>>>>>> few executors leads to the slower results.
>>>>>>>>
>>>>>>>> any idea why this would happen?
>>>>>>>> thanks! koert
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: spark 1.6 new memory management - some issues with tasks not using all executors

Reply via email to