Re: Improve Change Data Capture Use Case for Iceberg

2024-02-28 Thread Péter Váry
I have been thinking about this quite a bit. Moving the temporary manifest files could work, but the prepared and not yet committed data files are also present in their final place. These data files are also not part of the table yet, and could be removed by the orphan files removal process.

Re: Materialized view integration with REST spec

2024-02-28 Thread Walaa Eldin Moustafa
Thanks Ryan for the insights. I agree that reusing existing metadata definitions and minimizing spec changes are very important. This also minimizes spec drift (between materialized views and views spec, and between materialized views and tables spec), and simplifies the implementation. In an

Re: Materialized view integration with REST spec

2024-02-28 Thread Ryan Blue
I mean separate table and view metadata that is somehow combined through a commit process. For instance, keeping a pointer to a table metadata file in a view metadata file or combining commits to reference both. I don't see the value in either option. On Wed, Feb 28, 2024 at 5:05 PM Jack Ye

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-02-28 Thread Nirav Patel
Thanks for sharing those issues. it does seem related to me based on similar test case failures they had internally. i could try to drop iceberg-runtime in jars dir of spark and see if that help avoid this as it seems classloading issue comes up when loading using --jars args with spark-connect

Re: Materialized view integration with REST spec

2024-02-28 Thread Jack Ye
Sorry I guess another longer question: *What do we even mean here when we use the terms of table "metadata", view "metadata" and new "metadata" type?* This was clear before the REST spec was introduced, but is not so clear now. Maybe this is a good time to clarify it. If we look into the

Re: Materialized view integration with REST spec

2024-02-28 Thread Jack Ye
Thanks Ryan for the help to trace back to the root question! Just a clarification question regarding your reply before I reply further: what exactly does the option "a combination of the two (i.e. commits are combined)" mean? How is that different from "a new metadata type"? -Jack On Wed, Feb

Re: Materialized view integration with REST spec

2024-02-28 Thread Ryan Blue
I’m catching up on this conversation, so hopefully I can bring a fresh perspective. Jack already pointed out that we need to start from the basics and I agree with that. Let’s remove voting at this point. Right now is the time for discussing trade-offs, not lining up and taking sides. I realize

Re: Support permission concepts in REST spec

2024-02-28 Thread Ryan Blue
I think we should keep this separate from views. A view could be one way to implement this in engine integration, but I think the best direction is to pass the metadata directly and with a clear spec instead of trying to translate to a view in the REST catalog. Translation in the catalog would

Re: Proposal for RESTful Data Operations

2024-02-28 Thread Ryan Blue
I’m not sure that there is a single tenant to follow, but I can outline how I think about the REST protocol. The problem that the REST API solves is to standardize catalog interaction for Iceberg. I think that relies on being both a good standard and a good API. A good standard is small,

Re: Deprecate DynamodbCatalog

2024-02-28 Thread Jean-Baptiste Onofré
Hi Ryan, I agree with you: it would be great to have a discussion about Catalogs in 2.0 (and other topics maybe :)). I started a thread a few days ago about Iceberg 2.0 discussion. That's a good point and worth discussing in the community. Thanks ! Regards JB On Tue, Feb 27, 2024 at 9:30 PM

Re: Table Portability Proposal

2024-02-28 Thread Yufei Gu
Indeed, Manu, you're right. However, integrating support for v2 format based on this should be quite simple. Yufei On Wed, Feb 28, 2024 at 1:18 AM Manu Zhang wrote: > Hi Yufei, > > If I'm not mistaken, https://github.com/apache/iceberg/pull/4705 doesn't > support delete files or format v2,

Re: Improve Change Data Capture Use Case for Iceberg

2024-02-28 Thread Ryan Blue
> No removed temporary files on Flink failure. (Spark orphan file removal needs to be configured to prevent removal of Flink temporary files which are needed on recovery) This sounds like it's a larger problem. Shouldn't Flink store its state in a different prefix that won't be cleaned up by

Re: [VOTE] Release Apache Iceberg 1.5.0 RC4

2024-02-28 Thread Eduard Tudenhoefner
+1 (non-binding) * validated checksum and signature * checked license docs & ran RAT checks * ran build and tests with JDK11 * built new docker images and ran through https://iceberg.apache.org/spark-quickstart/ * tested with Trino & Presto * tested view support with Spark 3.5 + JDBC/REST catalog

Re: [VOTE] Release Apache Iceberg 1.5.0 RC4

2024-02-28 Thread Jean-Baptiste Onofré
+1 (non binding) I checked: - Signature and checksum are OK - Build is OK on the source distribution - ASF headers are present - No binary file found in the source distribution - Tested on iceland (sample project) + trino and also JDBC Catalog Thanks ! Regards JB On Tue, Feb 27, 2024 at 1:16 PM

Re: Improve Change Data Capture Use Case for Iceberg

2024-02-28 Thread Péter Váry
Sorry to chime in a bit late to the conversation. I am currently working in implementing Flink in-job maintenance. The main target audience: - Users who can't or don't want to use Spark - Users who need frequent checkpointing (low latency in the Iceberg table) and have many small files - CDC

Re: Support permission concepts in REST spec

2024-02-28 Thread Renjie Liu
> > Many of these decisions can be translated together to some sort of view on > top of a table. Consider user A has permission on table1, column c1 c2, > sha1 hash mask on email column, row filter age > 21. This can be translated > into a decision that user A can access a view *SELECT c1, c2,

Re: Table Portability Proposal

2024-02-28 Thread Manu Zhang
Hi Yufei, If I'm not mistaken, https://github.com/apache/iceberg/pull/4705 doesn't support delete files or format v2, does it? Manu On Fri, Feb 23, 2024 at 12:41 AM Yufei Gu wrote: > We took a different approach by modifying the metadata. It is a bit heavy > compared to the relative path and