1) reg: "LLAP caching currently does not consider cache locality. The same sql, compute tasks are currently not scheduled as far as possible to nodes that already have data cached. This may result in the same copy of data being repeatedly cached N times by multiple nodes. Is that really an advantage?"
That shouldn't happen. Please see https://issues.apache.org/jira/browse/HIVE-25651. It ensures LLAP knows which splits should benefit from already cached data, avoids combining unrelated files into splits, and improves split-to-host mapping by correcting path behavior. 2) I’m surprised to hear that disabling LLAP IO led to better performance results—we observed the opposite effect.
