Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-05 Thread Yufei Gu
Thanks, Pierre, for the proposal. I’m excited about the potential of serving these metrics via Polaris. They would be highly valuable for multiple use cases, including UI integration, TMS (deciding when and how to compact a table), monitoring, cost awareness (through table size trending), and query

Re: Re: [DISCUSS] Add JDBC Metastore table index

2025-09-05 Thread Yufei Gu
Hi Artur, thanks for sharing the experiment results. Do we have any data on the write side? The entity table holds almost everything in Polaris. We cannot ignore the write side impact. Also if the read perf doesn't improve a lot, we may leave the index as is. We can document how the extra index hel

Proposed Public OpenAPI Changes

2025-09-05 Thread Adnan Hemani
Hi all, While instrumenting event generation for all APIs, I found multiple public APIs which I would like to request a change to in order to better support event instrumentation as little-to-no cost to performance and/or bytes sent in a response. createCatalog This API currently returns (on

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-05 Thread Eric Maynard
> I am not against the idea of collecting telemetry (I think we would require an auxiliary compute for doing this, though), +1, I think collecting such telemetry is actually a great idea and agree that auxiliary compute is probably the right design. This was one of the initial motivations for the

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-05 Thread Prashant Singh
Hey Pierre, Thank you for taking a look at my recommendation, I think there are additional benefits of these Iceberg metrics for example ScanMetrics we literally get the expression that was applied to the query which essentially can help us get which subset of data is actively queried and hence run

RE: Re: [DISCUSS] Add JDBC Metastore table index

2025-09-05 Thread artur rakhmatulin
Hi, I hope these clarifications regarding the context and experiment details provide a clearer understanding of my proposal. About idx_entities_lookup: It is proposed to add an index on a limited set of fields used in the listEntities query for JdbcBasePersistenceImpl ("id", "catalog_id", "parent

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-05 Thread Pierre Laporte
Thanks for the feedback, Prashant As far as I can tell, we could use the Iceberg Metrics Reporting for only 3 operational metrics: * Total number of files in a table (using the CommitReport) * Total number of reads (the number of ScanReport) * Total number of writes (the number of CommitReport) I

Re: [VOTE] Release Apache Polaris 1.1.0-incubating (rc0)

2025-09-05 Thread Russell Spitzer
I'm back to repoing the error again. Again with no changes other than only running with no-build cache. I do think something is probably not right here but I don't think it's fundamental to the Polaris project. On Thu, Sep 4, 2025 at 10:37 AM Dmitri Bourlatchkov wrote: > Thanks for tracking the