> Another optimization which is far more useful is the shared fetch
>optimization. This tries to avoid copying the same data onto the same
>host multiple times.
> We've seen fairly good gains when fetching data to 10K reducers from a
>single source - 28 minutes
> down to 2 minutes. There's an example - BroadcastLoadGen - which can be
>used to try out this feature.


Since in 0.5.x we missed out on whitelisting runtime shared fetch, that I
couldn¹t test heavily in production installs - so it would be amazing if
you can do some broadcast JOIN tests with that enabled on a multi-rack
cluster.

My testing on a single-rack cluster has proved that even with full 10 GigE
node-to-node, it improves broadcast performance by about 10-20%, but on a
large cluster I¹m expecting at least a 2-3x performance improvement for
large map-joins with that feature.

Would be great to get scale validation on that for a user workload, since
I tested it on 350 nodes with broadcastloadgen (which showed the ~15x
gains), but did not test that heavily with a user workload.

Cheers,
Gopal


Reply via email to