Re: [Discuss] Heap pressure with RewriteFiles APIs

2024-05-24 Thread Naveen Kumar
Hi Amogh, Thanks for your feedback. It really sounds like a good idea to me. For all the heavy operations like compaction, gc it can reduce heap pressure. However, can't we do something where we don't need to save all the dataFiles? Especially for Rewrite cases, what will be harm if we flush to

Re: [Discuss] Heap pressure with RewriteFiles APIs

2024-05-22 Thread Amogh Jahagirdar
I'd think chunking the work as much as possible, and disabling metrics for columns where they're not helpful probably goes far but perhaps may be insufficient for extreme cases. I've also been thinking about if there are better space-efficient data structures for maintaining file paths which exploi

Re: [Discuss] Heap pressure with RewriteFiles APIs

2024-05-22 Thread Naveen Kumar
Hi Szehon, Thanks for your email. I agree configuring metadata metrics per column will create a smaller manifest file with lower and upper bounds per content entry. Assuming your patch is merged, it will works as following: 1. A user should identif

Re: [Discuss] Heap pressure with RewriteFiles APIs

2024-05-21 Thread Szehon Ho
Hi Naveen Yes it sounds like it will help to disable metrics for those columns? Iirc, by default it manifest entries have metrics at 'truncate(16)' level for 100 columns, which as you see can be quite memory intensive. A potential improvement later also is to have the ability to remove counts by

[Discuss] Heap pressure with RewriteFiles APIs

2024-05-21 Thread Naveen Kumar
Hi Everyone, I am looking into RewriteFiles APIs and its implementation BaseRewriteFiles