I have been thinking about this quite a bit.
Moving the temporary manifest files could work, but the prepared and not
yet committed data files are also present in their final place. These data
files are also not part of the table yet, and could be removed by the
orphan files removal process.
Thanks Ryan for the insights. I agree that reusing existing metadata
definitions and minimizing spec changes are very important. This also
minimizes spec drift (between materialized views and views spec, and
between materialized views and tables spec), and simplifies the
implementation.
In an
I mean separate table and view metadata that is somehow combined through a
commit process. For instance, keeping a pointer to a table metadata file in
a view metadata file or combining commits to reference both. I don't see
the value in either option.
On Wed, Feb 28, 2024 at 5:05 PM Jack Ye
Thanks for sharing those issues. it does seem related to me based on
similar test case failures they had internally. i could try to drop
iceberg-runtime in jars dir of spark and see if that help avoid this as it
seems classloading issue comes up when loading using --jars args with
spark-connect
Sorry I guess another longer question:
*What do we even mean here when we use the terms of table "metadata", view
"metadata" and new "metadata" type?*
This was clear before the REST spec was introduced, but is not so clear
now. Maybe this is a good time to clarify it.
If we look into the
Thanks Ryan for the help to trace back to the root question! Just a
clarification question regarding your reply before I reply further: what
exactly does the option "a combination of the two (i.e. commits are
combined)" mean? How is that different from "a new metadata type"?
-Jack
On Wed, Feb
I’m catching up on this conversation, so hopefully I can bring a fresh
perspective.
Jack already pointed out that we need to start from the basics and I agree
with that. Let’s remove voting at this point. Right now is the time for
discussing trade-offs, not lining up and taking sides. I realize
I think we should keep this separate from views. A view could be one way to
implement this in engine integration, but I think the best direction is to
pass the metadata directly and with a clear spec instead of trying to
translate to a view in the REST catalog. Translation in the catalog would
I’m not sure that there is a single tenant to follow, but I can outline how
I think about the REST protocol.
The problem that the REST API solves is to standardize catalog interaction
for Iceberg. I think that relies on being both a good standard and a good
API. A good standard is small,
Hi Ryan,
I agree with you: it would be great to have a discussion about
Catalogs in 2.0 (and other topics maybe :)). I started a thread a few
days ago about Iceberg 2.0 discussion.
That's a good point and worth discussing in the community.
Thanks !
Regards
JB
On Tue, Feb 27, 2024 at 9:30 PM
Indeed, Manu, you're right. However, integrating support for v2 format
based on this should be quite simple.
Yufei
On Wed, Feb 28, 2024 at 1:18 AM Manu Zhang wrote:
> Hi Yufei,
>
> If I'm not mistaken, https://github.com/apache/iceberg/pull/4705 doesn't
> support delete files or format v2,
> No removed temporary files on Flink failure. (Spark orphan file removal
needs to be configured to prevent removal of Flink temporary files which
are needed on recovery)
This sounds like it's a larger problem. Shouldn't Flink store its state in
a different prefix that won't be cleaned up by
+1 (non-binding)
* validated checksum and signature
* checked license docs & ran RAT checks
* ran build and tests with JDK11
* built new docker images and ran through
https://iceberg.apache.org/spark-quickstart/
* tested with Trino & Presto
* tested view support with Spark 3.5 + JDBC/REST catalog
+1 (non binding)
I checked:
- Signature and checksum are OK
- Build is OK on the source distribution
- ASF headers are present
- No binary file found in the source distribution
- Tested on iceland (sample project) + trino and also JDBC Catalog
Thanks !
Regards
JB
On Tue, Feb 27, 2024 at 1:16 PM
Sorry to chime in a bit late to the conversation.
I am currently working in implementing Flink in-job maintenance.
The main target audience:
- Users who can't or don't want to use Spark
- Users who need frequent checkpointing (low latency in the Iceberg table)
and have many small files
- CDC
>
> Many of these decisions can be translated together to some sort of view on
> top of a table. Consider user A has permission on table1, column c1 c2,
> sha1 hash mask on email column, row filter age > 21. This can be translated
> into a decision that user A can access a view *SELECT c1, c2,
Hi Yufei,
If I'm not mistaken, https://github.com/apache/iceberg/pull/4705 doesn't
support delete files or format v2, does it?
Manu
On Fri, Feb 23, 2024 at 12:41 AM Yufei Gu wrote:
> We took a different approach by modifying the metadata. It is a bit heavy
> compared to the relative path and
17 matches
Mail list logo