> We should probably add a section to our Flink docs that explains and
links to Flink’s support policy and has a table of Iceberg versions that
work with Flink versions. (We should probably have the same table for
Spark, too!)
Thanks Ryan for the suggestion, I created a separate issue to address t
Hi everyone,
I tried to prototype option 3, here is the PR:
https://github.com/apache/iceberg/pull/3237
Sorry I did not see that Anton is planning to do it, but anyway it's just a
draft, so feel free to just use it as reference.
Best,
Jack Ye
On Sun, Oct 3, 2021 at 2:19 PM Ryan Blue wrote:
>
Thanks for the context on the Flink side! I think it sounds reasonable to
keep up to date with the latest supported Flink version. If we want, we
could later go with something similar to what we do for Spark but we’ll see
how it goes and what the Flink community needs. We should probably add a
sect
Wing, sorry, my earlier message probably misled you. I was speaking my
personal opinion on Flink version support.
On Tue, Sep 28, 2021 at 8:03 PM Wing Yew Poon
wrote:
> Hi OpenInx,
> I'm sorry I misunderstood the thinking of the Flink community. Thanks for
> the clarification.
> - Wing Yew
>
>
>
Hi OpenInx,
I'm sorry I misunderstood the thinking of the Flink community. Thanks for
the clarification.
- Wing Yew
On Tue, Sep 28, 2021 at 7:15 PM OpenInx wrote:
> Hi Wing
>
> As we discussed above, we community prefer to choose option.2 or
> option.3. So in fact, when we planned to upgrade t
Hi Wing
As we discussed above, we community prefer to choose option.2 or option.3.
So in fact, when we planned to upgrade the flink version from 1.12 to
1.13, we are doing our best to guarantee the master iceberg repo could
work fine for both flink1.12 & flink1.13. More context please see [1], [2
In the last community sync, we spent a little time on this topic. For Spark
support, there are currently two options under consideration:
Option 2: Separate repo for the Spark support. Use branches for supporting
different Spark versions. Main branch for the latest Spark version (3.2 to
begin with
During the sync meeting, people talked about if and how we can have the
same version support model across engines like Flink and Spark. I can
provide some input from the Flink side.
Flink only supports two minor versions. E.g., right now Flink 1.13 is the
latest released version. That means only F
Since you mentioned Hive, I chime in with what we do there. You might find
it useful:
- metastore module - only small differences - DynConstructor solves for us
- mr module - some bigger differences, but still manageable for Hive 2-3.
Need some new classes, but most of the code is reused - extra mo
Okay, looks like there is consensus around supporting multiple Spark versions
at the same time. There are folks who mentioned this on this thread and there
were folks who brought this up during the sync.
Let’s think through Option 2 and 3 in more detail then.
Option 2
In Option 2, there will b
I'd support the option that Jack suggests if we can set a few expectations
for keeping it clean.
First, I'd like to avoid refactoring code to share it across Spark versions
-- that introduces risk because we're relying on compiling against one
version and running in another and both Spark and Scal
I think in Ryan's proposal we will create a ton of modules anyway, as Wing
listed we are just using git branch as an additional dimension, but my
understanding is that you will still have 1 core, 1 extension, 1 runtime
artifact published for each Spark version in either approach.
In that case, thi
Sorry, I was thinking about CI integration between Iceberg Java and Iceberg
Spark, I just didn't mention it and I see how that's a big thing to leave
out!
I would definitely want to test the projects together. One thing we could
do is have a nightly build like Russell suggests. I'm also wondering
I agree that Option 2 is considerably more difficult for development when core
API changes need to be picked up by the external Spark module. I also think a
monthly release would probably still be prohibitive to actually implementing
new features that appear in the API, I would hope we have a mu
IIUC, Option 2 is to move the Spark support for Iceberg into a separate
repo (subproject of Iceberg). Would we have branches such as 0.13-2.4,
0.13-3.0, 0.13-3.1, and 0.13-3.2? For features that can be supported in all
versions or all Spark 3 versions, then we would need to commit the changes
to al
Thanks for bringing this up, Anton. I’m glad that we have the set of
potential solutions well defined.
Looks like the next step is to decide whether we want to require people to
update Spark versions to pick up newer versions of Iceberg. If we choose to
make people upgrade, then option 1 is clearl
Thanks for bringing this up, Anton.
Everyone has great pros/cons to support their preferences. Before giving
my preference, let me raise one question:what's the top priority thing
for apache iceberg project at this point in time ? This question will help
us to answer the following question:
>From Dev's point, it has less burden to always support the latest version
of Spark (for example). But from user's point, especially for us who
maintain Spark internally, it is not easy to upgrade the Spark version for
the first time (since we have many customizations internally), and we're
still p
Hi Wing Yew,
I think 2.4 is a different story, we will continue to support Spark 2.4,
but as you can see it will continue to have very limited functionalities
comparing to Spark 3. I believe we discussed about option 3 when we were
doing Spark 3.0 to 3.1 upgrade. Recently we are seeing the same is
I understand and sympathize with the desire to use new DSv2 features in
Spark 3.2. I agree that Option 1 is the easiest for developers, but I don't
think it considers the interests of users. I do not think that most users
will upgrade to Spark 3.2 as soon as it is released. It is a "minor
version"
Option 1 sounds good to me. Here are my reasons:
1. Both 2 and 3 will slow down the development. Considering the limited
resources in the open source community, the upsides of option 2 and 3 are
probably not worthy.
2. Both 2 and 3 assume the use cases may not exist. It's hard to predict
anything,
To sum up what we have so far:
Option 1 (support just the most recent minor Spark 3 version)
The easiest option for us devs, forces the user to upgrade to the most recent
minor Spark version to consume any new Iceberg features.
Option 2 (a separate project under Iceberg)
Can support as many S
I think we should go for option 1. I already am not a big fan of having runtime
errors for unsupported things based on versions and I don't think minor version
upgrades are a large issue for users. I'm especially not looking forward to
supporting interfaces that only exist in Spark 3.2 in a mul
Hey Imran,
I don’t know why I forgot to mention this option too. It is definitely a
solution to consider. We used this approach to support Spark 2 and Spark 3.
Right now, this would mean having iceberg-spark (common code for all versions),
iceberg-spark2, iceberg-spark-3 (common code for all Spa
> First of all, is option 2 a viable option? We discussed separating the python
> module outside of the project a few weeks ago, and decided to not do that
> because it's beneficial for code cross reference and more intuitive for new
> developers to see everything in the same repository. I would
Thanks for bringing this up, Anton.
I am not entirely certain if your option 2 meant "project" in the "Apache
project" sense or the "gradle project" sense -- it sounds like you mean
"apache project".
If so, I'd propose Option 3:
Create a "spark-common" gradle project, which builds against the lo
First of all, is option 2 a viable option? We discussed separating the
python module outside of the project a few weeks ago, and decided to not do
that because it's beneficial for code cross reference and more intuitive
for new developers to see everything in the same repository. I would expect
the
27 matches
Mail list logo