Hi developers,
I have an initial idea: to collect various metrics during table maintenance
processes, such as compaction, snapshot expiration, and data expiration,
and store this information in a database via JDBC. The purpose of this
approach is twofold: first, to persist these maintenance records for easier
tracking of optimization effects and cost estimation in the future; second,
we can also calculate snapshot and other information when refreshing
tables. This way, historical snapshot information and corresponding data
change trends can be displayed to users. By querying the database rather
than reading metadata files from the data lake via IO requests, we can
reduce service response time and IO pressure.
Please let me know your thoughts.

Thank you,
Xavier Bai

Reply via email to