Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-07 Thread Pierre Laporte
I was definitely not aware of that endpoint, so thanks a lot for bringing that up ! I am glad there is appetite for even more metrics :-) One thing that I was trying to be mindful about is the extra load that the MetaStore will have to handle. Typically, assuming ~10 metrics per table, this coul

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-05 Thread Yufei Gu
Thanks, Pierre, for the proposal. I’m excited about the potential of serving these metrics via Polaris. They would be highly valuable for multiple use cases, including UI integration, TMS (deciding when and how to compact a table), monitoring, cost awareness (through table size trending), and query

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-05 Thread Eric Maynard
> I am not against the idea of collecting telemetry (I think we would require an auxiliary compute for doing this, though), +1, I think collecting such telemetry is actually a great idea and agree that auxiliary compute is probably the right design. This was one of the initial motivations for the

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-05 Thread Prashant Singh
Hey Pierre, Thank you for taking a look at my recommendation, I think there are additional benefits of these Iceberg metrics for example ScanMetrics we literally get the expression that was applied to the query which essentially can help us get which subset of data is actively queried and hence run

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-05 Thread Pierre Laporte
Thanks for the feedback, Prashant As far as I can tell, we could use the Iceberg Metrics Reporting for only 3 operational metrics: * Total number of files in a table (using the CommitReport) * Total number of reads (the number of ScanReport) * Total number of writes (the number of CommitReport) I

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-04 Thread Prashant Singh
Thank you for the proposal Pierre ! I think having metrics on the entities that Polaris is really helpful for telemetry as well making decisions on when and what partitions to run compactions. Iceberg already emits the metric from client end to the rest server via RestMetricsReporter

[PROPOSAL] Add Data Lake operational metrics to Polaris

2025-09-04 Thread Pierre Laporte
Hi folks, I would like to propose the addition of a component to Polaris that would build and maintain operational metrics for the Data Lake tables and views. The main idea is that, if those metrics can be shared across multiple Table Management Services and/or other external services, then it wou