xxx

2020-07-27 Thread Danny Chan

Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-27 Thread Jungtaek Lim
Hi devs, I'm experimenting with Apache Iceberg for Structured Streaming sink - plan to experiment with source as well, but I see PR still in review. It seems that "fast append" pretty much helps to retain reasonable latency for committing, though the metadata directory grows too fast. I found the

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-27 Thread Ryan Blue
Hi Jungtaek, That setting controls whether Iceberg cleans up old copies of the table metadata file. The metadata file holds references to all of the table's snapshots (that have no expired) and is self-contained. No operations need to access previous metadata files. Those aren't typically that la

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-27 Thread Jungtaek Lim
Thanks for the quick response! And yes I also went through experimenting expireSnapshots() and it looked good. I can imagine some alternative conditions on expiring snapshots (like adjusting "granularity" between snapshots instead of removing all snapshots before the specific timestamp), but for n

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-27 Thread Ryan Blue
> seems to fail on high rate writing streaming query being run on the other side This kind of situation is where you'd want to tune the number of retries for a table. That's a likely source of the problem. We can also check to make sure we're being smart about conflict detection. A rewrite needs t

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-27 Thread Jungtaek Lim
I'd love to contribute documentation about the actions - just need some time to understand the needs for some actions (like RewriteManifestAction). I just submitted a PR for structured streaming sink [1]. I mentioned expireSnapshot() there with linking javadoc page, but it'd be nice if there's als

Re: [DISCUSS] SQL syntax extensions

2020-07-27 Thread Anton Okolnychyi
Thanks everybody for taking a look at the doc. FYI, I’ve updated it. I would like to share some intermediate thoughts. 1. It seems beneficial to follow the stored procedures approach to call small actions like rollback or expire snapshots. Presto already allows connectors to define stored proce