Shuffle buffer size in presence of small partitions

Robert Schmidtke Mon, 31 Jul 2017 06:37:01 -0700

Hi all,

I just ran into an issue, which likely resulted from my not very
intelligent configuration, but nonetheless I'd like to share this with the
community. This is all on Hadoop 2.7.3.


In my setup, each reducer roughly fetched 65K from each mapper's spill
file. I disabled transferTo during shuffle, because I wanted to have a look
at the file system statistics, which miss mmap calls, which is what
transferTo sometimes defaults to. I left the shuffle buffer size at 128K
(not knowing about the parameter at the time). This had the effect that I
observed roughly 100% more data being read during shuffle, since 128K were
read for each 65K needed.

I added a quick fix to Hadoop which chooses the minimum of the partition
size and the shuffle buffer size:
https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer
Benchmarking this version against transferTo.allowed=true yields the same
runtime and roughly 10% more reads in YARN during the shuffle phase
(compared to previous 100%).
Maybe this is something that should be added to Hadoop? Or do users have to
be more clever about their job configurations? I'd be happy to open a PR if
this is deemed useful.

Anyway, thanks for the attention!

Cheers
Robert

-- 
My GPG Key ID: 336E2680

Shuffle buffer size in presence of small partitions

Reply via email to