Hi devs,
I'm experimenting with Apache Iceberg for Structured Streaming sink - plan
to experiment with source as well, but I see PR still in review.
It seems that "fast append" pretty much helps to retain reasonable latency
for committing, though the metadata directory grows too fast. I found the
Hi Jungtaek,
That setting controls whether Iceberg cleans up old copies of the table
metadata file. The metadata file holds references to all of the table's
snapshots (that have no expired) and is self-contained. No operations need
to access previous metadata files.
Those aren't typically that la
Thanks for the quick response!
And yes I also went through experimenting expireSnapshots() and it looked
good. I can imagine some alternative conditions on expiring snapshots (like
adjusting "granularity" between snapshots instead of removing all snapshots
before the specific timestamp), but for n
> seems to fail on high rate writing streaming query being run on the other
side
This kind of situation is where you'd want to tune the number of retries
for a table. That's a likely source of the problem. We can also check to
make sure we're being smart about conflict detection. A rewrite needs t
I'd love to contribute documentation about the actions - just need some
time to understand the needs for some actions (like RewriteManifestAction).
I just submitted a PR for structured streaming sink [1]. I mentioned
expireSnapshot() there with linking javadoc page, but it'd be nice if
there's als
Thanks everybody for taking a look at the doc. FYI, I’ve updated it.
I would like to share some intermediate thoughts.
1. It seems beneficial to follow the stored procedures approach to call small
actions like rollback or expire snapshots. Presto already allows connectors to
define stored proce