o not have any parent dependencies and always return
>>>>>>>> an empty iterator.
>>>>>>>>
>>>>>>>> I believe this should work as desired (at least the previous
>>>>>>>> ShuffleMapStage will think tha
nged).
>>>>>>>
>>>>>>> There are few issues though - existence of empty partitions which
>>>>>>> can be evaluated almost for free and empty output files from these empty
>>>>>>> partitons which can be beaten by m
hat the number of partitons in the next
>>>>>>> stage, it generates shuffle output for, is not changed).
>>>>>>>
>>>>>>> There are few issues though - existence of empty partitions which can
>>>>>>> be
be beaten by means of LazyOutputFormat in case of
>>>>>> RDDs.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 8, 2018, 23:57 Koert Kuipers wrote:
>>>>>>
>>>>>>> although i person
:
>>>>>
>>>>>> although i personally would describe this as a bug the answer will be
>>>>>> that this is the intended behavior. the coalesce "infects" the shuffle
>>>>>> before it, making a coalesce useless for reducing o
put files after a
>>>>> shuffle with many partitions b design.
>>>>>
>>>>> your only option left is a repartition for which you pay the price in
>>>>> that it introduces another expensive shuffle.
>>>>>
>>>>&g
;>>> that this is the intended behavior. the coalesce "infects" the shuffle
>>>>> before it, making a coalesce useless for reducing output files after a
>>>>> shuffle with many partitions b design.
>>>>>
>>>>> your o
g output files after a
>>>> shuffle with many partitions b design.
>>>>
>>>> your only option left is a repartition for which you pay the price in
>>>> that it introduces another expensive shuffle.
>>>>
>>>> interesting
hich you pay the price in
>>> that it introduces another expensive shuffle.
>>>
>>> interestingly if you do a coalesce on a map-only job it knows how to
>>> reduce the partitions and output files without introducing a shuffle, so
>>> clearly it is possi
t, making a coalesce useless for reducing output files after a
> >>> shuffle with many partitions b design.
> >>>
> >>> your only option left is a repartition for which you pay the price in
> >>> that it introduces another expensive shuffle.
> &
eft is a repartition for which you pay the price in that
>>> it introduces another expensive shuffle.
>>>
>>> interestingly if you do a coalesce on a map-only job it knows how to reduce
>>> the partitions and output files without introducing a shuffle, so clear
utput files without introducing a shuffle, so
>> clearly it is possible, but i dont know how to get this behavior after a
>> shuffle in an existing job.
>>
>> On Fri, Oct 5, 2018 at 6:34 PM Sergey Zhemzhitsky
>> wrote:
>>
>>> Hello guys,
>>>
>
o get this behavior after a
> shuffle in an existing job.
>
> On Fri, Oct 5, 2018 at 6:34 PM Sergey Zhemzhitsky
> wrote:
>
>> Hello guys,
>>
>> Currently I'm a little bit confused with coalesce behaviour.
>>
>> Consider the following usecase - I'
ffle in
an existing job.
On Fri, Oct 5, 2018 at 6:34 PM Sergey Zhemzhitsky
wrote:
> Hello guys,
>
> Currently I'm a little bit confused with coalesce behaviour.
>
> Consider the following usecase - I'd like to join two pretty big RDDs.
> To make a join more stable and t
Hello guys,
Currently I'm a little bit confused with coalesce behaviour.
Consider the following usecase - I'd like to join two pretty big RDDs.
To make a join more stable and to prevent it from failures by OOM RDDs
are usually repartitioned to redistribute data more evenly and to
pre
15 matches
Mail list logo