Re: Gravitino an Iceberg REST catalog service

2024-02-29 Thread Ajantha Bhat
I apologize for the delay in responding. I'm pleased to see the development of an open-source REST catalog implementation, and the potential transition of Gravitino to an ASF project is certainly promising. But REST catalog server implementation will be a small part of Gravitino ASF project. Which

Re: Gravitino an Iceberg REST catalog service

2024-02-29 Thread Jean-Baptiste Onofré
Hi Ajantha, Thanks for sharing your thoughts. It makes sense for Gravitino to be a TLP (after the incubation period) because Gravitino is "more" than an Iceberg catalog. It implements the Iceberg REST Catalog API, but it's also a metadata catalog/repo with additional features. That said, I agree

Re: Materialized view integration with REST spec

2024-02-29 Thread Jan Kaul
Hi all, I would like to provide my perspective on the question of what a materialized view is and elaborate on Jack's recent proposal to view a materialized view as a catalog concept. Firstly, let's look at the role of the catalog. Every entity in the catalog has a *unique identifier*, and t

Flink: uncommitted data files and garbage collection

2024-02-29 Thread Steven Wu
We are probably off the topic of the original thread. I am moving the Flink part of the discussion to a new thread/subject. > but the prepared and not yet committed data files are also present in their final place. These data files are also not part of the table yet, and could be removed by the or

Re: Flink: uncommitted data files and garbage collection

2024-02-29 Thread Steven Wu
> Maybe the Iceberg sink jobs should exit terminally if it hasn't been able to commit to Iceberg after a threshold (like 24 hours) e.g. due to catalog service outage. This will prevent Flink jobs from producing more data files that can't be committed. Actually, this doesn't help. old uncommitted d

Re: Gravitino an Iceberg REST catalog service

2024-02-29 Thread Jean-Baptiste Onofré
By the way, I think it's a good time to think about REST Catalog API v2. Actually, I would name this the Catalog RFC containing: - the RFC description itself (documentation) - the improved OpenAPI 3.0 spec - possible OpenAPI extensions (allowing extra features, vendor specific, etc) https://swagge

Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
Hi everyone, Just want to pull this specific topic out of the materialized view discussion thread. I noticed this during the MV discussion, and I think it is important to clarify this not just for the MV topic, but also for the ongoing discussion to consolidate all the different catalogs. *How th

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Daniel Weeks
Hey Jack, I'm not sure I agree with the framing of this argument. The REST Spec defines a protocol, not an implementation. The implementation of the spec can either be compliant or not. So a REST Implementation that adheres to all the requirements (atomic location swap, json representation, etc

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jean-Baptiste Onofré
Hi Dan, I agree with your statement about REST Spec is not an implement but I strongly disagree with your statement "impl of the spec can either be compliant or not". The REST Catalog spec impl should be consistent with the REST Spec. That's why a reference implementation in Iceberg would be a mu

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
> The implementation of the spec can either be compliant or not. This is exactly the problem we are talking about right? Just to give an example, we cannot technically say that tables/views in the Tabular catalog are Iceberg tables/views, because a REST spec-compliant catalog does not need to foll

Re: Materialized view integration with REST spec

2024-02-29 Thread Szehon Ho
Hi Yes I mostly agree with the assessment. To clarify a few minor points. is a materialized view a view and a separate table, a combination of the > two (i.e. commits are combined), or a new metadata type? For 'new metadata type', I consider mostly Jack's initial proposal of a new Catalog MV o

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Daniel Weeks
> REST spec-compliant catalog does not need to follow the Iceberg spec to commit or store metadata If the REST implementation doesn't follow the Iceberg spec for commit requirements, it's not compliant with the spec. There's no exemption that says if you're using REST you don't need to follow the

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jean-Baptiste Onofré
Hey Dan imho, the REST Spec should provide access to the Iceberg spec layer. I don't say both should be in sync, but REST Spec should expose the resources of the Iceberg Spec. Else, I would consider it incomplete and limited in terms of features. Regards JB On Thu, Feb 29, 2024 at 9:28 PM Danie

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jean-Baptiste Onofré
Hey Jack It's a proposal in another thread (community effort on Catalog RFC). Regards JB On Thu, Feb 29, 2024 at 9:19 PM Jack Ye wrote: > > > The implementation of the spec can either be compliant or not. > > This is exactly the problem we are talking about right? Just to give an > example, we

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Amogh Jahagirdar
I want to echo Dan's point that just because there is a separate spec for a REST Catalog does not mean that implementations can deviate from the spec's definition of the commit protocol or metadata layout, and still be considered "spec compliant". > Secondly, once we do that, we should declare RES

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
> There's no exemption that says if you're using REST you don't need to follow the spec. Why do you think that's the case? In that case are tables in a REST-compliant catalog still an Iceberg table? I don't think so, because it is a table that only partially follows the Iceberg table spec. I lik

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Yufei Gu
> > We've periodically discussed removing the storage requirement and I think > there's a path forward to do that and would agree that standardizing on > REST, but I wouldn't say the justification for making this push is that > REST is not compliant so we can just ignore the table spec requirements

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Daniel Weeks
> In that case are tables in a REST-compliant catalog still an Iceberg table? I don't think so, because it is a table that only partially follows the Iceberg table spec. If the catalog is REST compliant and complies with the Iceberg spec, they are still Iceberg tables. I can see there is an argum

Re: Materialized view integration with REST spec

2024-02-29 Thread Walaa Eldin Moustafa
One additional point advantage of the separate view and table approach is it will save the need to change all engine catalog APIs to expose materialized views as separate objects with their own engine catalog APIs. Hece, Iceberg can add the materialized view support without being blocked on other e

Re: Materialized view integration with REST spec

2024-02-29 Thread Ryan Blue
Looks like it wasn’t clear what I meant for the 3 categories, so I’ll be more specific: - *Separate table and view*: this option is to have the objects that we have today, with extra metadata. Commit processes are separate: committing to the table doesn’t alter the view and committing to

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
> For example, I cannot validate the atomic behaviors Glue claims, but I wouldn't assert that it is non-compliant because of that. I think these are not comparable claims because the API scope is completely different, but I don't think it's worth arguing in depth. Let's try to see if we can have s

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Daniel Weeks
1. I agree, this is what the spec currently requires 2. I agree, it's up for consideration 3. I agree, I think if an implementation didn't adhere to the current spec requirements, I would say it's out of spec (not sure I'd go as far as to say it's a different kind of table entirely). Just to exp

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Ryan Blue
Once again, I’m catching up late and might have a helpful perspective. I think there was a mistake in the OpenAPI spec for loading tables and the metadata-location is not listed as required. I don’t recall that being intentional, but maybe it was? Maybe for a different reason? Either way, when we

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Ryan Blue
Oops. In the first paragraph, I meant “when we added the endpoint to load a VIEW, metadata-location was correctly marked as required." On Thu, Feb 29, 2024 at 4:18 PM Ryan Blue wrote: > Once again, I’m catching up late and might have a helpful perspective. > > I think there was a mistake in the

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
> I feel like the goal is to identify those cases and steer them back into compliance with the spec +10 > as opposed to immediately claiming they're something entirely different In case this comment is talking about my last sentence "More extremely, it might be a totally different kind of ta

Re: Gravitino an Iceberg REST catalog service

2024-02-29 Thread Ryan Blue
There is a reference implementation in the project, in the CatalogHandlers class. That implements REST requests using a catalog and returns REST responses. I believe this is what Gravatno relies on and I mentioned it above in the discussion about whether we should have a catalog service. Catalog t

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Ryan Blue
I did not notice the difference between table and view. Should we change that for tables then? It depends on what we consider a breaking change at this point. Plus, we may want it to be optional in the future. My main point, though, is that I wouldn’t read too much into it being optional. I think

Re: Materialized view integration with REST spec

2024-02-29 Thread Walaa Eldin Moustafa
Ryan, in the option "Separate table and view", will there be a reference (or pointer) to the table from the view metadata? Since the option of "embedding a table metadata location in view metadata" is not preferred, it is not clear how to associate the table with the view in the "Separate table and

Re: Materialized view integration with REST spec

2024-02-29 Thread Ryan Blue
> Ryan, in the option "Separate table and view", will there be a reference (or pointer) to the table from the view metadata? Yes. And this is a problem we need to solve generally because a materialized table needs to be able to track the upstream state of tables that were used. I think it would be

Re: Gravitino an Iceberg REST catalog service

2024-02-29 Thread Ajantha Bhat
Ryan, thank you for the clarifications, they're truly appreciated. I'd like to expand on the last point I raised. > From the perspective of open-source users, the absence of an open-source > implementation for the REST catalog within Iceberg may be inconvenient or > frustrating. REST catalog wa

Re: Materialized view integration with REST spec

2024-02-29 Thread Walaa Eldin Moustafa
Ok since the option "A combination of a view and a table" also has some sort of a pointer (from the view to the table in the view metadata), and it is rejected, I think the key distinction with the first option "Separate table and view", is that in the second option, a specific version of the table

Re: [VOTE] Release Apache Iceberg 1.5.0 RC4

2024-02-29 Thread Ajantha Bhat
Gentle reminder. On Wed, Feb 28, 2024 at 8:34 PM Eduard Tudenhoefner wrote: > +1 (non-binding) > > * validated checksum and signature > * checked license docs & ran RAT checks > * ran build and tests with JDK11 > * built new docker images and ran through > https://iceberg.apache.org/spark-quicks

Re: Gravitino an Iceberg REST catalog service

2024-02-29 Thread Jean-Baptiste Onofré
Hi Ryan If we plan to reduce the number of catalogs (and I think it makes sense and I'm with you on that), we will need a impl/service in Iceberg for the REST Catalog API, else the users won't be able to use Iceberg "out of the box". We already maintain "service" considering JDBC, Hive, etc catalo

Re: Materialized view integration with REST spec

2024-02-29 Thread Jan Kaul
Hi Ryan, we actually discussed your categories in this question . Where your categories correspond to the following designs: * Separate table and view => Design 1 * Combination o

Re: Materialized view integration with REST spec

2024-02-29 Thread Jack Ye
> Jack, it sounds like you’re the proponent of a combined table and view (rather than a new metadata spec for a materialized view). What is the main motivation? It seems like you’re convinced of that approach, but I don’t understand the advantage it brings. Sorry I have to make a Google Sheet to c

Re: Gravitino an Iceberg REST catalog service

2024-02-29 Thread Fokko Driesprong
Hey everyone, Thanks for raising this. I think a test-jar would be a great first step. We already maintain "service" considering JDBC, Hive, etc catalogs. REST Catalog ref impl in Iceberg would be the sam. What I think Ryan means by a service is having to maintain Postgres (JDBC backend), Hive

Re: Gravitino an Iceberg REST catalog service

2024-02-29 Thread Jean-Baptiste Onofré
Hi Fokko If service means the actual runtime service, I partially agree. I would love to see REST Catalog API the "central cornerstone" used in iceberg-java, pyiceberg, etc. So I think we should provide the resources for an user to bootstrap a REST Catalog ref impl. A lot of Apache projects provi