I have a question, LLAP caching currently does not consider cache locality. The 
same sql, compute tasks are currently not scheduled as far as possible to nodes 
that already have data cached. This may result in the same copy of data being 
repeatedly cached N times by multiple nodes. Is that really an advantage?



---- Replied Message ----
| From | Butao Zhang<[email protected]> |
| Date | 09/08/2025 17:18 |
| To | [email protected] |
| Cc | |
| Subject | Re: Parameter-tuning Hive-LLAP |
Good point! I think we can run the TPC-DS benchmark multiple times and wait 
until the LLAP cache has sufficiently cached the data onto the SSD. Then, we 
can observe whether the test performance improves. If I remember correctly, 
LLAP has a page where you can check the cache hit rate.

Thanks,
Butao Zhang

On 2025/09/08 09:12:59 Denys Kuzmenko wrote:
> hi Sungwoo,
>
> I don’t believe the TPC-DS benchmark is the best way to demonstrate the 
> advantages of Hive LLAP’s distributed cache.
> TPC-DS is primarily designed to measure query optimization and overall system 
> performance across a wide variety of complex workloads, but it doesn’t 
> necessarily highlight scenarios where LLAP’s in-memory caching of frequently 
> accessed data provides clear benefits.
> A more targeted benchmark or workload that emphasizes repeated access to the 
> same datasets would be a better fit to showcase the strengths of LLAP’s 
> distributed caching capabilities.
>
> Regards,
> Denys
>

Reply via email to