Hi Yufei
There was a proposed PR for this :
https://github.com/apache/iceberg/pull/4522
On Thu, Apr 21, 2022 at 5:42 AM Yufei Gu wrote:
> Hi team,
>
> Do we have a PR for this type of delete compaction?
>
>> Merge: the changes specified in delete files are applied to data files
>> and then over
Hi team,
Do we have a PR for this type of delete compaction?
> Merge: the changes specified in delete files are applied to data files
> and then overwrite the original data file, e.g. merging delete files to
> data files.
Yufei
On Wed, Nov 3, 2021 at 8:29 AM Puneet Zaroo
wrote:
> Sounds g
Sounds great. I will look at the PRs.
thanks,
On Tue, Nov 2, 2021 at 11:35 PM Jack Ye wrote:
>
> Yes I am actually arriving at exactly the same conclusion as you just now.
> I was focusing on the immediate removal of delete files too much when
> writing the doc and lost this aspect that we don't
Yes I am actually arriving at exactly the same conclusion as you just now.
I was focusing on the immediate removal of delete files too much when
writing the doc and lost this aspect that we don't need to remove the
deletes after having the functionality to preserve sequence number.
I just publishe
Thanks for further clarifications, and outlining the detailed steps for the
delete or 'MERGE' compaction. It seems this compaction is explicitly geared
towards removing delete files. While that may be useful; I feel for CDC
tables doing the Bin-pack and Sorting compactions and *removing the NEED
fo
> I think even with the custom sequence file numbers on output data files;
the position delete files have to be deleted; *since position deletes also
apply on data files with the same sequence number*. Also, unless I am
missing something, I think the equality delete files cannot be deleted at
the e
Thanks for the clarifications; and thanks for pulling together the
documentation for the row-level delete functionality separately; as that
will be very helpful.
I think we are in agreement on most points. I just want to reiterate my
understanding of the merge compaction behavior to make sure we ar
> why can't this strategy do bin-packing or sorting as well; if that is
required; as long as the sequence number is not updated.
> wouldn't subsequent reads re-apply the delete files which were used in
the merge as well?
I think you are right, we can use the sequence number of the snapshot we
star
Another follow-up regarding this : *"Merge strategy that does not do any
bin-packing, and only merges the delete files for each data file and writes
it back. The new data file will have the same sequence number as the old
file before Merge"* ; shouldn't the sequence number be set to the highest
seq
I had a few follow-up points.
1 *"(1) for hot partitions, users can try to only perform Convert and
Rewrite to keep delete file sizes and count manageable, until the partition
becomes cold and a Merge can be performed safely.".* : I believe for the
CDC use case it is hard to guarantee that that p
Had some offline discussions on Slack and WeChat.
For Russell's point, we are reconfirming with related people on Slack, and
will post updates once we have an agreement.
Regarding point 6, for Flink CDC the data file flushed to disk might be
associated with position deletes, but after the flush a
I think I understood the Rewrite strategy discussion a little differently
Binpack Strategy and SortStrategy each get a new flag which lets you pick
files based on their number of delete files. So basically you can set a
variety of parameters, small files, large files, files with deletes etc ...
A
Thanks to everyone who came to the meeting.
Here is the full meeting recording I made:
https://drive.google.com/file/d/1yuBFlNn9nkMlH9TIut2H8CXmJGLd18Sa/view?usp=sharing
Here are some key takeaways:
1. we generally agreed upon the division of compactions into Rewrite,
Convert and Merge.
2. Merg
Hi everyone,
We are planning to have a meeting to discuss the design of Iceberg delete
compaction on Thursday 5-6pm PDT. The meeting link is
https://meet.google.com/nxx-nnvj-omx.
We have also created the channel #compaction on Slack, please join the
channel for daily discussions if you are intere
Hi everyone,
As there are more and more people adopting the v2 spec, we are seeing an
increasing number of requests for delete compaction support.
Here is a document discussing the use cases and basic interface design for
it to get the community aligned around what compactions we would offer and
15 matches
Mail list logo