> What are the problems with having >tez.runtime.shuffle.keep-alive.enabled and >tez.runtime.optimize.local.fetch set to true always by default?
Nothing has failed due to these so far - we¹ve gone through one entire release where we tested both heavily and found that they work very well at scale. local.fetch is already enabled by default in 0.7.x (TEZ-2333). shared.fetch isn¹t getting flipped right now because last release it didn¹t get enough coverage on customer setups (for my liking) to bake it in (the broadcast edge didn¹t whitelist that config). The keep-alive shuffle was tested on 350 nodes, with 10,000 mappers. And the advantage of these were significant - between those three options a broadcast JOIN went from about 30 minutes of shuffle time to around 2 1/2 minutes. You do need a 64 bit OS (not sandbox) with a modern kernel to safely flip these on - system configs on Centos need to roughly correspond to the ktune settings for RHEL (other than THP & numad/zone_reclaim). These configs help shuffle in general - off the top of my head, tcp_fin_timeout and somaxconn comes to mind immediately as being the relevant configs to always tune. There¹s a certain inflection point we hit in shuffle, where it¹s worse to be faster - fixes like HADOOP-11226 help there, but they need router/switch configs as well. Cheers, Gopal
