1) reg:
"LLAP caching currently does not consider cache locality. The same sql, compute 
tasks are currently not scheduled as far as possible to nodes that already have 
data cached. This may result in the same copy of data being repeatedly cached N 
times by multiple nodes. Is that really an advantage?"

That shouldn't happen.  Please see 
https://issues.apache.org/jira/browse/HIVE-25651. It ensures LLAP knows which 
splits should benefit from already cached data, avoids combining unrelated 
files into splits, and improves split-to-host mapping by correcting path 
behavior.

2) I’m surprised to hear that disabling LLAP IO led to better performance 
results—we observed the opposite effect.

Reply via email to