subject:"Re\: \[DISCUSS\] Spark version support strategy"

Re: [DISCUSS] Spark version support strategy

2021-10-07 Thread OpenInx

> We should probably add a section to our Flink docs that explains and links to Flink’s support policy and has a table of Iceberg versions that work with Flink versions. (We should probably have the same table for Spark, too!) Thanks Ryan for the suggestion, I created a separate issue to address t

Re: [DISCUSS] Spark version support strategy

2021-10-06 Thread Jack Ye

Hi everyone, I tried to prototype option 3, here is the PR: https://github.com/apache/iceberg/pull/3237 Sorry I did not see that Anton is planning to do it, but anyway it's just a draft, so feel free to just use it as reference. Best, Jack Ye On Sun, Oct 3, 2021 at 2:19 PM Ryan Blue wrote: >

Re: [DISCUSS] Spark version support strategy

2021-10-03 Thread Ryan Blue

Thanks for the context on the Flink side! I think it sounds reasonable to keep up to date with the latest supported Flink version. If we want, we could later go with something similar to what we do for Spark but we’ll see how it goes and what the Flink community needs. We should probably add a sect

Re: [DISCUSS] Spark version support strategy

2021-09-29 Thread Steven Wu

Wing, sorry, my earlier message probably misled you. I was speaking my personal opinion on Flink version support. On Tue, Sep 28, 2021 at 8:03 PM Wing Yew Poon wrote: > Hi OpenInx, > I'm sorry I misunderstood the thinking of the Flink community. Thanks for > the clarification. > - Wing Yew > > >

Re: [DISCUSS] Spark version support strategy

2021-09-28 Thread Wing Yew Poon

Hi OpenInx, I'm sorry I misunderstood the thinking of the Flink community. Thanks for the clarification. - Wing Yew On Tue, Sep 28, 2021 at 7:15 PM OpenInx wrote: > Hi Wing > > As we discussed above, we community prefer to choose option.2 or > option.3. So in fact, when we planned to upgrade t

Re: [DISCUSS] Spark version support strategy

2021-09-28 Thread OpenInx

Hi Wing As we discussed above, we community prefer to choose option.2 or option.3. So in fact, when we planned to upgrade the flink version from 1.12 to 1.13, we are doing our best to guarantee the master iceberg repo could work fine for both flink1.12 & flink1.13. More context please see [1], [2

Re: [DISCUSS] Spark version support strategy

2021-09-28 Thread Wing Yew Poon

In the last community sync, we spent a little time on this topic. For Spark support, there are currently two options under consideration: Option 2: Separate repo for the Spark support. Use branches for supporting different Spark versions. Main branch for the latest Spark version (3.2 to begin with

Re: [DISCUSS] Spark version support strategy

2021-09-23 Thread Steven Wu

During the sync meeting, people talked about if and how we can have the same version support model across engines like Flink and Spark. I can provide some input from the Flink side. Flink only supports two minor versions. E.g., right now Flink 1.13 is the latest released version. That means only F

Re: [DISCUSS] Spark version support strategy

2021-09-16 Thread Peter Vary

Since you mentioned Hive, I chime in with what we do there. You might find it useful: - metastore module - only small differences - DynConstructor solves for us - mr module - some bigger differences, but still manageable for Hive 2-3. Need some new classes, but most of the code is reused - extra mo

Re: [DISCUSS] Spark version support strategy

2021-09-16 Thread Anton Okolnychyi

Okay, looks like there is consensus around supporting multiple Spark versions at the same time. There are folks who mentioned this on this thread and there were folks who brought this up during the sync. Let’s think through Option 2 and 3 in more detail then. Option 2 In Option 2, there will b

Re: [DISCUSS] Spark version support strategy

2021-09-16 Thread Ryan Blue

I'd support the option that Jack suggests if we can set a few expectations for keeping it clean. First, I'd like to avoid refactoring code to share it across Spark versions -- that introduces risk because we're relying on compiling against one version and running in another and both Spark and Scal

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Jack Ye

I think in Ryan's proposal we will create a ton of modules anyway, as Wing listed we are just using git branch as an additional dimension, but my understanding is that you will still have 1 core, 1 extension, 1 runtime artifact published for each Spark version in either approach. In that case, thi

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Ryan Blue

Sorry, I was thinking about CI integration between Iceberg Java and Iceberg Spark, I just didn't mention it and I see how that's a big thing to leave out! I would definitely want to test the projects together. One thing we could do is have a nightly build like Russell suggests. I'm also wondering

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Russell Spitzer

I agree that Option 2 is considerably more difficult for development when core API changes need to be picked up by the external Spark module. I also think a monthly release would probably still be prohibitive to actually implementing new features that appear in the API, I would hope we have a mu

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Wing Yew Poon

IIUC, Option 2 is to move the Spark support for Iceberg into a separate repo (subproject of Iceberg). Would we have branches such as 0.13-2.4, 0.13-3.0, 0.13-3.1, and 0.13-3.2? For features that can be supported in all versions or all Spark 3 versions, then we would need to commit the changes to al

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Ryan Blue

Thanks for bringing this up, Anton. I’m glad that we have the set of potential solutions well defined. Looks like the next step is to decide whether we want to require people to update Spark versions to pick up newer versions of Iceberg. If we choose to make people upgrade, then option 1 is clearl

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread OpenInx

Thanks for bringing this up, Anton. Everyone has great pros/cons to support their preferences. Before giving my preference, let me raise one question:what's the top priority thing for apache iceberg project at this point in time ? This question will help us to answer the following question:

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Saisai Shao

>From Dev's point, it has less burden to always support the latest version of Spark (for example). But from user's point, especially for us who maintain Spark internally, it is not easy to upgrade the Spark version for the first time (since we have many customizations internally), and we're still p

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Jack Ye

Hi Wing Yew, I think 2.4 is a different story, we will continue to support Spark 2.4, but as you can see it will continue to have very limited functionalities comparing to Spark 3. I believe we discussed about option 3 when we were doing Spark 3.0 to 3.1 upgrade. Recently we are seeing the same is

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Wing Yew Poon

I understand and sympathize with the desire to use new DSv2 features in Spark 3.2. I agree that Option 1 is the easiest for developers, but I don't think it considers the interests of users. I do not think that most users will upgrade to Spark 3.2 as soon as it is released. It is a "minor version"

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Yufei Gu

Option 1 sounds good to me. Here are my reasons: 1. Both 2 and 3 will slow down the development. Considering the limited resources in the open source community, the upsides of option 2 and 3 are probably not worthy. 2. Both 2 and 3 assume the use cases may not exist. It's hard to predict anything,

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Anton Okolnychyi

To sum up what we have so far: Option 1 (support just the most recent minor Spark 3 version) The easiest option for us devs, forces the user to upgrade to the most recent minor Spark version to consume any new Iceberg features. Option 2 (a separate project under Iceberg) Can support as many S

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Russell Spitzer

I think we should go for option 1. I already am not a big fan of having runtime errors for unsupported things based on versions and I don't think minor version upgrades are a large issue for users. I'm especially not looking forward to supporting interfaces that only exist in Spark 3.2 in a mul

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Anton Okolnychyi

Hey Imran, I don’t know why I forgot to mention this option too. It is definitely a solution to consider. We used this approach to support Spark 2 and Spark 3. Right now, this would mean having iceberg-spark (common code for all versions), iceberg-spark2, iceberg-spark-3 (common code for all Spa

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Anton Okolnychyi

> First of all, is option 2 a viable option? We discussed separating the python > module outside of the project a few weeks ago, and decided to not do that > because it's beneficial for code cross reference and more intuitive for new > developers to see everything in the same repository. I would

Re: [DISCUSS] Spark version support strategy

2021-09-13 Thread Imran Rashid

Thanks for bringing this up, Anton. I am not entirely certain if your option 2 meant "project" in the "Apache project" sense or the "gradle project" sense -- it sounds like you mean "apache project". If so, I'd propose Option 3: Create a "spark-common" gradle project, which builds against the lo

Re: [DISCUSS] Spark version support strategy

2021-09-13 Thread Jack Ye

First of all, is option 2 a viable option? We discussed separating the python module outside of the project a few weeks ago, and decided to not do that because it's beneficial for code cross reference and more intuitive for new developers to see everything in the same repository. I would expect the

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

Re: [DISCUSS] Spark version support strategy

27 matches

Site Navigation

Mail list logo

Footer information