Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Saisai Shao
>From Dev's point, it has less burden to always support the latest version of Spark (for example). But from user's point, especially for us who maintain Spark internally, it is not easy to upgrade the Spark version for the first time (since we have many customizations internally), and we're still p

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread OpenInx
Thanks for bringing this up, Anton. Everyone has great pros/cons to support their preferences. Before giving my preference, let me raise one question:what's the top priority thing for apache iceberg project at this point in time ? This question will help us to answer the following question:

Spark3 Row Level Delete Support

2021-09-15 Thread Aman Rawat
Hey team, We are trying to implement Spark support for Row Level Deletes for iceberg. Can you please throw some light on - where this work stream is & how can we help. Regards,

Re: Spark3 Row Level Delete Support

2021-09-15 Thread Anton Okolnychyi
I have a working end-to-end solution locally using some of the upcoming features in Spark 3.2 and a more elaborate version of what is available in this Spark PR: https://github.com/apache/spark/pull/33008 More details on the proposed Spark APIs are i

Re: Spark3 Row Level Delete Support

2021-09-15 Thread Anton Okolnychyi
Here is the action PR I mentioned: https://github.com/apache/iceberg/pull/2841 - Anton > On 15 Sep 2021, at 09:14, Anton Okolnychyi > wrote: > > I have a working end-to-end solution locally using some of the upcoming > features in Spark 3.2 and a

Re: Spark3 Row Level Delete Support

2021-09-15 Thread Aman Rawat
Thanks a lot Anton. This is really helpful. We will read through this and check out the current PRs to get up to speed here. Best Regards, Aman. On Wed, Sep 15, 2021 at 9:53 PM Anton Okolnychyi wrote: > Here is the action PR I mentioned: > https://github.com/apache/iceberg/pull/2841 >

Re: [DISCUSS] UUID type

2021-09-15 Thread Joshua Howard
Just following up on Piotr's message here. Have we converged? I think most people would assume that silence is a vote for the status-quo. On Mon, Sep 13, 2021 at 7:30 AM Piotr Findeisen wrote: > Hi, > > It seems we converged here that UUID should remain included. > I read this as a consensus re

Re: [DISCUSS] UUID type

2021-09-15 Thread Ryan Blue
I don't think we necessarily reached consensus, but I think the general trend toward the end was to keep support for UUID. Should we start a vote to validate consensus? On Wed, Sep 15, 2021 at 1:15 PM Joshua Howard wrote: > Just following up on Piotr's message here. > > Have we converged? I thin

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Ryan Blue
Thanks for bringing this up, Anton. I’m glad that we have the set of potential solutions well defined. Looks like the next step is to decide whether we want to require people to update Spark versions to pick up newer versions of Iceberg. If we choose to make people upgrade, then option 1 is clearl

Re: [DISCUSS] Iceberg roadmap

2021-09-15 Thread Ryan Blue
Gidon, I think that the v3 part of encryption is actually documenting how it works and adding it to the spec. Right now we have hooks for building some encryption around it, but almost no requirements in the spec for how to use it across implementations. This is fine while we're working on defining

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Wing Yew Poon
IIUC, Option 2 is to move the Spark support for Iceberg into a separate repo (subproject of Iceberg). Would we have branches such as 0.13-2.4, 0.13-3.0, 0.13-3.1, and 0.13-3.2? For features that can be supported in all versions or all Spark 3 versions, then we would need to commit the changes to al

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Russell Spitzer
I agree that Option 2 is considerably more difficult for development when core API changes need to be picked up by the external Spark module. I also think a monthly release would probably still be prohibitive to actually implementing new features that appear in the API, I would hope we have a mu

Re: [DISCUSS] Iceberg roadmap

2021-09-15 Thread Jack Ye
For external Trino and PrestoDB tasks, I am thinking about creating one Github project for Trino and another one for PrestoDB to manage all tasks under them, adding links of issues and PRs in the other communities to track progress. This is mostly to improve visibility so that people who are intere

Re: [DISCUSS] Iceberg roadmap

2021-09-15 Thread Ryan Blue
That sounds great, thanks for taking that on Jack! On Wed, Sep 15, 2021 at 3:51 PM Jack Ye wrote: > For external Trino and PrestoDB tasks, I am thinking about creating one > Github project for Trino and another one for PrestoDB to manage all tasks > under them, adding links of issues and PRs in

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Ryan Blue
Sorry, I was thinking about CI integration between Iceberg Java and Iceberg Spark, I just didn't mention it and I see how that's a big thing to leave out! I would definitely want to test the projects together. One thing we could do is have a nightly build like Russell suggests. I'm also wondering

Re: Snapshot tagging, branching and retention

2021-09-15 Thread Eduard Tudenhoefner
Nice work Jack, the proposal looks really good. On Sun, Aug 29, 2021 at 9:20 AM Jack Ye wrote: > Hi everyone, > > Recently I have published PR 2961 - add snapshot tags interface ( > https://github.com/apache/iceberg/pull/2961) and received a lot of great > feedback. I have summarized everything

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Jack Ye
I think in Ryan's proposal we will create a ton of modules anyway, as Wing listed we are just using git branch as an additional dimension, but my understanding is that you will still have 1 core, 1 extension, 1 runtime artifact published for each Spark version in either approach. In that case, thi