> Another optimization which is far more useful is the shared fetch
>optimization. This tries to avoid copying the same data onto the same
>host multiple times.
> We've seen fairly good gains when fetching data to 10K reducers from a
>single source - 28 minutes
> down to 2 minutes. There's an examp
Hello Amit,
For the most part, all that local fetch does is that in the case where the
upstream vertex's output is on the same host where the downstream vertex task
is running, the fetcher reads the data directly from disk instead of going via
the http-based shuffle handler. This is an optimiz
Amit,
The local fetch optimization is enabled by default in Tez-0.7. It reduces
the number of connections by a bit and ends up reading files generated on
the same box directly.
Another optimization which is far more useful is the shared fetch
optimization. This tries to avoid copying the same data
Hey guys,Local fetch optimization seems like an awesome feature. I'd like to
add some tests for our CI/CD pipeline that exercise this feature.Any thoughts
on what kind of setup, data etc I may need for this?thanks--amit