> How about we do the same thing like Spark, to have the application code in hudi-flink, hudi-flink-1.11, hudi-flink1.12
I'm fine with multiple versions if there are some breaking changes. But i would suggest to do it start from 1.11 or higher version, because 1.11 is a version that has many API changes, including the SQL and runtime We can do some compatibility work for versions lower than 1.11 in the first version though. Gary Li <garyli1...@outlook.com> 于2021年1月8日周五 上午12:04写道: > I have been seeing a wide range of Spark users from Spark 2.2 to Spark3, > so I am expecting Flink could be similar once we have more Flink users > onboard. I also agree with Danny that we should work together to make the > Flink writer production ready for the large data flow asap. > > How about we do the same thing like Spark, to have the application code in > hudi-flink, hudi-flink-1.11, hudi-flink1.12? The table and index should > still stay in the hudi-flink-client. If hudi-flink-client needs to include > any version specified feature of Flink, we can discuss the detail on the > RFC. WDTY? > > Best Regards, > Gary Li > > > On 1/7/21, 8:12 PM, "vino yang" <yanghua1...@gmail.com> wrote: > > +1 on Gary's opinion, > > Yes, the public APIs that come from AbstractHoodieWriteClient should be > able to reuse. > > We could try to make the HoodieFlinkWriteClient a common > implementation. > > IIUC, there is a mapping like this: > > SparkRDDWriteClient -> HoodieFlinkWriteClient > HoodieDeltaStreamer -> HoodieFlinkStreamer (it could be multiple?) > > Actually, I and Danny's divergence is that we need one > HoodieFlinkStreamer > or two HoodieFlinkStreamers. > > We can maintain one or two, although, we both try to find a good way to > maintain one app (entry-point). > > Correct me, if I am wrong. > > Best, > Vino > > > Gary Li <garyli1...@outlook.com> 于2021年1月7日周四 下午4:31写道: > > > Hi all, > > > > IIUC the current flink writer is like an app, just like the delta > > streamer. If we want to build another Flink writer, we can still > share the > > same flink client right? Does the flink client also have to use the > new > > feature only available on Flink 1.12? > > > > Thanks, > > Gary Li > > ________________________________ > > From: Danny Chan <danny0...@apache.org> > > Sent: Thursday, January 7, 2021 10:19 AM > > To: dev@hudi.apache.org <dev@hudi.apache.org> > > Subject: Re: Re: [DISCUSS] New Flink Writer Proposal > > > > Thanks vino yang ~ > > > > IMO, we should not put much put too much energy for current Flink > writer, > > it is not production-ready in the long run. There are so many > features need > > to add/support for the Flink write/read(MOR write, COR read, MOR > read, the > > new index), we should focus on one version first, make it robust. > > > > I really hope that we can work together to make the writer > production-ready > > as soon as possible, it is competitive that we have competitors like > Apache > > Iceberg and Delta lake, so from this perspective, there is no > benefit to be > > compatible with the current version writer. > > > > My idea is that i propose the new infrastructure first as quickly as > > possible(the basic pipeline, the test framework.), and then we can > work > > together for the new version (MOR write, COR read, MOR read, the new > > index), we better not distract from promote the old writer. > > > > What do you think? > > > > vino yang <yanghua1...@gmail.com> 于2021年1月6日周三 下午2:14写道: > > > > > Hi Danny, > > > > > > As we discussed in the doc, we should agree on if we should be > compatible > > > with the version less than Flink 1.11/1.12. > > > > > > We all know that there are some bottlenecks in the current plan. > You > > > proposed some improvements, yes it is great, but it radically uses > the > > > newer features provided by Flink. It is a pity that some users of > old > > > versions of Flink have no way to benefit from these features. > > > > > > The information I can provide is that some users have already used > the > > > current Flink write client or its improved version in a production > > > environment. For example, SF Technology, and the Flink versions > they use > > > are 1.8.x and 1.10.x. > > > > > > Therefore, I personally suggest that there are two options: > > > > > > 1) The new design takes into account users of lower versions as > much as > > > possible and maintains a client version; > > > 2) The new design is based on the features of the new version and > evolves > > > separately from the old version(we also have a plan to optimize the > > current > > > implementation), but the public abstraction can be reused. I think > it is > > > not impossible to maintain multiple versions. Flink used to > support 4+ > > > versions (0.8.2, 0.9, 0.10, 0.11, universal connector) for Kafka > > Connector, > > > but they share the same code base. > > > > > > Any thoughts and opinions are welcome and appreciated. > > > > > > Best, > > > Vino > > > > > > vino yang <yanghua1...@gmail.com> 于2021年1月6日周三 下午1:37写道: > > > > > > > Hi Danny, > > > > > > > > You should have cwiki edit permission now. > > > > Any problems let me know. > > > > > > > > Best, > > > > Vino > > > > > > > > Danny Chan <danny0...@apache.org> 于2021年1月6日周三 下午12:05写道: > > > > > > > >> Sorry ~ > > > >> > > > >> Forget to say that my Confluence ID is danny0405. > > > >> > > > >> It would be nice if any of you can help on this. > > > >> > > > >> Best, > > > >> Danny Chan > > > >> > > > >> Danny Chan <danny0...@apache.org> 于2021年1月6日周三 下午12:00写道: > > > >> > > > >> > Hi, can someone give me the CWIKI permission so that i can > update > > the > > > >> > design details to that (maybe as a new RFC though ~). > > > >> > > > > >> > wangxianghu <wxhj...@126.com> 于2021年1月5日周二 下午2:43写道: > > > >> > > > > >> >> + 1, Thanks Danny! > > > >> >> I believe this new feature OperatorConrdinator in flink-1.11 > will > > > help > > > >> >> improve the current implementation > > > >> >> > > > >> >> Best, > > > >> >> > > > >> >> XianghuWang > > > >> >> > > > >> >> At 2021-01-05 14:17:37, "vino yang" <yanghua1...@gmail.com> > wrote: > > > >> >> >Hi, > > > >> >> > > > > >> >> >Sharing more details, the OperatorConrdinator is the part > of the > > new > > > >> Data > > > >> >> >Source API(Beta) involved in the Flink 1.11's release > note[1]. > > > >> >> > > > > >> >> >Flink 1.11 was released only about half a year ago. The > design of > > > >> RFC-13 > > > >> >> >began at the end of 2019, and most of the implementation was > > > completed > > > >> >> when > > > >> >> >Flink 1.11 was released. > > > >> >> > > > > >> >> >I believe that the production environment of many large > companies > > > has > > > >> not > > > >> >> >been upgraded so quickly (As far as our company is > concerned, we > > > still > > > >> >> have > > > >> >> >some jobs running on flink release packages below 1.9). > > > >> >> > > > > >> >> >So, maybe we need to find a mechanism to benefit both new > and old > > > >> users. > > > >> >> > > > > >> >> >[1]: > > > >> >> > > > > >> >> > > > >> > > > > > > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2Fnews%2F2020%2F07%2F06%2Frelease-1.11.0.html%23new-data-source-api-beta&data=04%7C01%7C%7C5ce33f4da3b7421d096108d8b3057bc8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456183424591944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PNttPwYb6PRxMsOQuaIPXErQML1y6RgLGkUzFqgPZfE%3D&reserved=0 > > > >> >> > > > > >> >> >Best, > > > >> >> >Vino > > > >> >> > > > > >> >> >vino yang <yanghua1...@gmail.com> 于2021年1月5日周二 下午12:30写道: > > > >> >> > > > > >> >> >> Hi, > > > >> >> >> > > > >> >> >> +1, thank you Danny for introducing this new feature > > > >> >> >> (OperatorCoordinator)[1] of Flink in the recently latest > > version. > > > >> >> >> This feature is very helpful for improving the > implementation > > > >> >> mechanism of > > > >> >> >> Flink write-client. > > > >> >> >> > > > >> >> >> But this feature is only available after Flink 1.11. > Before > > that, > > > >> there > > > >> >> >> was no good way to realize the mechanism of task upstream > and > > > >> >> downstream > > > >> >> >> coordination through the public API provided by Flink. > > > >> >> >> I just have a concern, whether we need to take into > account the > > > >> users > > > >> >> of > > > >> >> >> earlier versions (less than Flink 1.11). > > > >> >> >> > > > >> >> >> [1]: > > > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-15099&data=04%7C01%7C%7C5ce33f4da3b7421d096108d8b3057bc8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456183424591944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=92Imz%2F%2FI5TP25%2FkxxOsWeHKExyKf9A3JCXIKXE5IHJw%3D&reserved=0 > > > >> >> >> > > > >> >> >> Best, > > > >> >> >> Vino > > > >> >> >> > > > >> >> >> Gary Li <garyli1...@outlook.com> 于2021年1月5日周二 上午10:40写道: > > > >> >> >> > > > >> >> >>> Hi Danny, > > > >> >> >>> > > > >> >> >>> Thanks for the proposal. I'd recommend starting a new > RFC. > > RFC-13 > > > >> was > > > >> >> >>> done and including some work about the refactoring so we > should > > > >> mark > > > >> >> it as > > > >> >> >>> completed. Looking forward to having further discussion > on the > > > RFC. > > > >> >> >>> > > > >> >> >>> Best, > > > >> >> >>> Gary Li > > > >> >> >>> ________________________________ > > > >> >> >>> From: Danny Chan <danny0...@apache.org> > > > >> >> >>> Sent: Tuesday, January 5, 2021 10:22 AM > > > >> >> >>> To: dev@hudi.apache.org <dev@hudi.apache.org> > > > >> >> >>> Subject: Re: [DISCUSS] New Flink Writer Proposal > > > >> >> >>> > > > >> >> >>> Sure, i can update the RFC-13 cwiki if you agree with > that. > > > >> >> >>> > > > >> >> >>> Vinoth Chandar <vin...@apache.org> 于2021年1月5日周二 > 上午2:58写道: > > > >> >> >>> > > > >> >> >>> > Overall +1 on the idea. > > > >> >> >>> > > > > >> >> >>> > Danny, could we move this to the apache cwiki if you > don't > > > mind? > > > >> >> >>> > That's what we have been using for other RFC > discussions. > > > >> >> >>> > > > > >> >> >>> > On Mon, Jan 4, 2021 at 1:22 AM Danny Chan < > > > danny0...@apache.org> > > > >> >> wrote: > > > >> >> >>> > > > > >> >> >>> > > The RFC-13 Flink writer has some bottlenecks that > make it > > > hard > > > >> to > > > >> >> >>> adapter > > > >> >> >>> > > to production: > > > >> >> >>> > > > > > >> >> >>> > > - The InstantGeneratorOperator is parallelism 1, > which is a > > > >> limit > > > >> >> for > > > >> >> >>> > > high-throughput consumption; because all the split > inputs > > > drain > > > >> >> to a > > > >> >> >>> > single > > > >> >> >>> > > thread, the network IO would gains pressure too > > > >> >> >>> > > - The WriteProcessOperator handles inputs by > partition, > > that > > > >> >> means, > > > >> >> >>> > within > > > >> >> >>> > > each partition write process, the BUCKETs are > written one > > by > > > >> one, > > > >> >> the > > > >> >> >>> > FILE > > > >> >> >>> > > IO is limit to adapter to high-throughput inputs > > > >> >> >>> > > - It buffers the data by checkpoints, which is too > hard to > > be > > > >> >> robust > > > >> >> >>> for > > > >> >> >>> > > production, the checkpoint function is blocking and > should > > > not > > > >> >> have IO > > > >> >> >>> > > operations. > > > >> >> >>> > > - The FlinkHoodieIndex is only valid for a per-job > scope, > > it > > > >> does > > > >> >> not > > > >> >> >>> > work > > > >> >> >>> > > for existing bootstrap data or for different Flink > jobs > > > >> >> >>> > > > > > >> >> >>> > > Thus, here I propose a new design for the Flink > writer to > > > solve > > > >> >> these > > > >> >> >>> > > problems[1]. Overall, the new design tries to remove > the > > > single > > > >> >> >>> > parallelism > > > >> >> >>> > > operators and make the index more powerful and > scalable. > > > >> >> >>> > > > > > >> >> >>> > > I plan to solve these bottlenecks incrementally (4 > steps), > > > >> there > > > >> >> are > > > >> >> >>> > > already some local POCs for these proposals. > > > >> >> >>> > > > > > >> >> >>> > > I'm looking forward to your feedback. Any > suggestions are > > > >> >> appreciated > > > >> >> >>> ~ > > > >> >> >>> > > > > > >> >> >>> > > [1] > > > >> >> >>> > > > > > >> >> >>> > > > > > >> >> >>> > > > > >> >> >>> > > > >> >> > > > >> > > > > > > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I%2Fedit%3Fusp%3Dsharing&data=04%7C01%7C%7C5ce33f4da3b7421d096108d8b3057bc8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456183424601900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Pf9Ep8D7x2Q7Es7g0dZd1hJxE5ib0LYtVTSiz2COVzw%3D&reserved=0 > > > >> >> >>> > > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >> > > > >> >> > > > >> > > > > >> > > > > > > > > > > >