Re: Number of entries in manifest-list

2022-01-10 Thread g. g. grey
That helps a lot! Thank you Szehon for the detailed response! ggg On Fri, Jan 7, 2022 at 1:54 PM Szehon Ho wrote: > Sure, I guessed you were asking about the number of manifest files rather > than entries. There's always a tradeoff, some aspects being: > >- More manifest files => better pr

Re: Number of entries in manifest-list

2022-01-07 Thread Szehon Ho
Sure, I guessed you were asking about the number of manifest files rather than entries. There's always a tradeoff, some aspects being: - More manifest files => better predicate pushdown (skip more manifest files during query), and less chance for concurrency conflict (which is two transa

Re: Number of entries in manifest-list

2022-01-07 Thread g. g. grey
Hi Szehon, Thanks. My apologies; I was too loose in my wording. I'll try to use the terms from the spec. I was asking about the number of total manifest files, specifically the number of `manifest_file` structs that are found in the manifest-list file. It sounds like the "commit.manifest.target-

Re: Number of entries in manifest-list

2022-01-07 Thread Szehon Ho
Hi, The manifest entries are one per data file or delete file, so depends how many data files/delete files your table has. Number of files is controlled mostly by the parallelism of the job that writes the table, though there are Iceberg RewriteDataFile utilities that can compact as well (as in y

Number of entries in manifest-list

2022-01-07 Thread g. g. grey
Hi folks, I am just getting started with Iceberg and I'm trying to build up some intuition for how large the metadata will become for large, active tables. Specifically, what is the order of magnitude of manifest entries that I should reasonably expect in a manifest-list file? Is there a particula