Re: Enable local fetch optimization by default

2015-05-15 Thread Gopal Vijayaraghavan
> Another optimization which is far more useful is the shared fetch >optimization. This tries to avoid copying the same data onto the same >host multiple times. > We've seen fairly good gains when fetching data to 10K reducers from a >single source - 28 minutes > down to 2 minutes. There's an examp

Re: Enable local fetch optimization by default

2015-05-15 Thread Hitesh Shah
Hello Amit, For the most part, all that local fetch does is that in the case where the upstream vertex's output is on the same host where the downstream vertex task is running, the fetcher reads the data directly from disk instead of going via the http-based shuffle handler. This is an optimiz

Re: Enable local fetch optimization by default

2015-05-15 Thread Siddharth Seth
Amit, The local fetch optimization is enabled by default in Tez-0.7. It reduces the number of connections by a bit and ends up reading files generated on the same box directly. Another optimization which is far more useful is the shared fetch optimization. This tries to avoid copying the same data

Enable local fetch optimization by default

2015-05-15 Thread Amit Tiwari
Hey guys,Local fetch optimization seems like an awesome feature. I'd like to add some tests for our CI/CD pipeline that exercise this feature.Any thoughts on what kind of setup, data etc I may need for this?thanks--amit