> How about we do the same thing like Spark, to have the application code
in hudi-flink, hudi-flink-1.11, hudi-flink1.12

I'm fine with multiple versions if there are some breaking changes. But i
would suggest to do it start from 1.11 or higher version, because 1.11 is a
version that has many API changes, including the SQL and runtime

We can do some compatibility work for versions lower than 1.11 in the first
version though.

Gary Li <garyli1...@outlook.com> 于2021年1月8日周五 上午12:04写道:

> I have been seeing a wide range of Spark users from Spark 2.2 to Spark3,
> so I am expecting Flink could be similar once we have more Flink users
> onboard. I also agree with Danny that we should work together to make the
> Flink writer production ready for the large data flow asap.
>
> How about we do the same thing like Spark, to have the application code in
> hudi-flink, hudi-flink-1.11, hudi-flink1.12? The table and index should
> still stay in the hudi-flink-client. If hudi-flink-client needs to include
> any version specified feature of Flink, we can discuss the detail on the
> RFC. WDTY?
>
> Best Regards,
> Gary Li
>
>
> On 1/7/21, 8:12 PM, "vino yang" <yanghua1...@gmail.com> wrote:
>
>     +1 on Gary's opinion,
>
>     Yes, the public APIs that come from AbstractHoodieWriteClient should be
>     able to reuse.
>
>     We could try to make the HoodieFlinkWriteClient a common
> implementation.
>
>     IIUC, there is a mapping like this:
>
>     SparkRDDWriteClient -> HoodieFlinkWriteClient
>     HoodieDeltaStreamer -> HoodieFlinkStreamer (it could be multiple?)
>
>     Actually, I and Danny's divergence is that we need one
> HoodieFlinkStreamer
>     or two HoodieFlinkStreamers.
>
>     We can maintain one or two, although, we both try to find a good way to
>     maintain one app (entry-point).
>
>     Correct me, if I am wrong.
>
>     Best,
>     Vino
>
>
>     Gary Li <garyli1...@outlook.com> 于2021年1月7日周四 下午4:31写道:
>
>     > Hi all,
>     >
>     > IIUC the current flink writer is like an app, just like the delta
>     > streamer. If we want to build another Flink writer, we can still
> share the
>     > same flink client right? Does the flink client also have to use the
> new
>     > feature only available on Flink 1.12?
>     >
>     > Thanks,
>     > Gary Li
>     > ________________________________
>     > From: Danny Chan <danny0...@apache.org>
>     > Sent: Thursday, January 7, 2021 10:19 AM
>     > To: dev@hudi.apache.org <dev@hudi.apache.org>
>     > Subject: Re: Re: [DISCUSS] New Flink Writer Proposal
>     >
>     > Thanks vino yang ~
>     >
>     > IMO, we should not put much put too much energy for current Flink
> writer,
>     > it is not production-ready in the long run. There are so many
> features need
>     > to add/support for the Flink write/read(MOR write, COR read, MOR
> read, the
>     > new index), we should focus on one version first, make it robust.
>     >
>     > I really hope that we can work together to make the writer
> production-ready
>     > as soon as possible, it is competitive that we have competitors like
> Apache
>     > Iceberg and Delta lake, so from this perspective, there is no
> benefit to be
>     > compatible with the current version writer.
>     >
>     > My idea is that i propose the new infrastructure first as quickly as
>     > possible(the basic pipeline, the test framework.), and then we can
> work
>     > together for the new version (MOR write, COR read, MOR read, the new
>     > index), we better not distract from promote the old writer.
>     >
>     > What do you think?
>     >
>     > vino yang <yanghua1...@gmail.com> 于2021年1月6日周三 下午2:14写道:
>     >
>     > > Hi Danny,
>     > >
>     > > As we discussed in the doc, we should agree on if we should be
> compatible
>     > > with the version less than Flink 1.11/1.12.
>     > >
>     > > We all know that there are some bottlenecks in the current plan.
> You
>     > > proposed some improvements, yes it is great, but it radically uses
> the
>     > > newer features provided by Flink. It is a pity that some users of
> old
>     > > versions of Flink have no way to benefit from these features.
>     > >
>     > > The information I can provide is that some users have already used
> the
>     > > current Flink write client or its improved version in a production
>     > > environment. For example, SF Technology, and the Flink versions
> they use
>     > > are 1.8.x and 1.10.x.
>     > >
>     > > Therefore, I personally suggest that there are two options:
>     > >
>     > > 1) The new design takes into account users of lower versions as
> much as
>     > > possible and maintains a client version;
>     > > 2) The new design is based on the features of the new version and
> evolves
>     > > separately from the old version(we also have a plan to optimize the
>     > current
>     > > implementation), but the public abstraction can be reused. I think
> it is
>     > > not impossible to maintain multiple versions. Flink used to
> support 4+
>     > > versions (0.8.2, 0.9, 0.10, 0.11, universal connector) for Kafka
>     > Connector,
>     > > but they share the same code base.
>     > >
>     > > Any thoughts and opinions are welcome and appreciated.
>     > >
>     > > Best,
>     > > Vino
>     > >
>     > > vino yang <yanghua1...@gmail.com> 于2021年1月6日周三 下午1:37写道:
>     > >
>     > > > Hi Danny,
>     > > >
>     > > > You should have cwiki edit permission now.
>     > > > Any problems let me know.
>     > > >
>     > > > Best,
>     > > > Vino
>     > > >
>     > > > Danny Chan <danny0...@apache.org> 于2021年1月6日周三 下午12:05写道:
>     > > >
>     > > >> Sorry ~
>     > > >>
>     > > >> Forget to say that my Confluence ID is danny0405.
>     > > >>
>     > > >> It would be nice if any of you can help on this.
>     > > >>
>     > > >> Best,
>     > > >> Danny Chan
>     > > >>
>     > > >> Danny Chan <danny0...@apache.org> 于2021年1月6日周三 下午12:00写道:
>     > > >>
>     > > >> > Hi, can someone give me the CWIKI permission so that i can
> update
>     > the
>     > > >> > design details to that (maybe as a new RFC though ~).
>     > > >> >
>     > > >> > wangxianghu <wxhj...@126.com> 于2021年1月5日周二 下午2:43写道:
>     > > >> >
>     > > >> >> + 1, Thanks Danny!
>     > > >> >> I believe this new feature OperatorConrdinator in flink-1.11
> will
>     > > help
>     > > >> >> improve the current implementation
>     > > >> >>
>     > > >> >> Best,
>     > > >> >>
>     > > >> >> XianghuWang
>     > > >> >>
>     > > >> >> At 2021-01-05 14:17:37, "vino yang" <yanghua1...@gmail.com>
> wrote:
>     > > >> >> >Hi,
>     > > >> >> >
>     > > >> >> >Sharing more details, the OperatorConrdinator is the part
> of the
>     > new
>     > > >> Data
>     > > >> >> >Source API(Beta) involved in the Flink 1.11's release
> note[1].
>     > > >> >> >
>     > > >> >> >Flink 1.11 was released only about half a year ago. The
> design of
>     > > >> RFC-13
>     > > >> >> >began at the end of 2019, and most of the implementation was
>     > > completed
>     > > >> >> when
>     > > >> >> >Flink 1.11 was released.
>     > > >> >> >
>     > > >> >> >I believe that the production environment of many large
> companies
>     > > has
>     > > >> not
>     > > >> >> >been upgraded so quickly (As far as our company is
> concerned, we
>     > > still
>     > > >> >> have
>     > > >> >> >some jobs running on flink release packages below 1.9).
>     > > >> >> >
>     > > >> >> >So, maybe we need to find a mechanism to benefit both new
> and old
>     > > >> users.
>     > > >> >> >
>     > > >> >> >[1]:
>     > > >> >> >
>     > > >> >>
>     > > >>
>     > >
>     >
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2Fnews%2F2020%2F07%2F06%2Frelease-1.11.0.html%23new-data-source-api-beta&amp;data=04%7C01%7C%7C5ce33f4da3b7421d096108d8b3057bc8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456183424591944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PNttPwYb6PRxMsOQuaIPXErQML1y6RgLGkUzFqgPZfE%3D&amp;reserved=0
>     > > >> >> >
>     > > >> >> >Best,
>     > > >> >> >Vino
>     > > >> >> >
>     > > >> >> >vino yang <yanghua1...@gmail.com> 于2021年1月5日周二 下午12:30写道:
>     > > >> >> >
>     > > >> >> >> Hi,
>     > > >> >> >>
>     > > >> >> >> +1, thank you Danny for introducing this new feature
>     > > >> >> >> (OperatorCoordinator)[1] of Flink in the recently latest
>     > version.
>     > > >> >> >> This feature is very helpful for improving the
> implementation
>     > > >> >> mechanism of
>     > > >> >> >> Flink write-client.
>     > > >> >> >>
>     > > >> >> >> But this feature is only available after Flink 1.11.
> Before
>     > that,
>     > > >> there
>     > > >> >> >> was no good way to realize the mechanism of task upstream
> and
>     > > >> >> downstream
>     > > >> >> >> coordination through the public API provided by Flink.
>     > > >> >> >> I just have a concern, whether we need to take into
> account the
>     > > >> users
>     > > >> >> of
>     > > >> >> >> earlier versions (less than Flink 1.11).
>     > > >> >> >>
>     > > >> >> >> [1]:
>     >
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-15099&amp;data=04%7C01%7C%7C5ce33f4da3b7421d096108d8b3057bc8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456183424591944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=92Imz%2F%2FI5TP25%2FkxxOsWeHKExyKf9A3JCXIKXE5IHJw%3D&amp;reserved=0
>     > > >> >> >>
>     > > >> >> >> Best,
>     > > >> >> >> Vino
>     > > >> >> >>
>     > > >> >> >> Gary Li <garyli1...@outlook.com> 于2021年1月5日周二 上午10:40写道:
>     > > >> >> >>
>     > > >> >> >>> Hi Danny,
>     > > >> >> >>>
>     > > >> >> >>> Thanks for the proposal. I'd recommend starting a new
> RFC.
>     > RFC-13
>     > > >> was
>     > > >> >> >>> done and including some work about the refactoring so we
> should
>     > > >> mark
>     > > >> >> it as
>     > > >> >> >>> completed. Looking forward to having further discussion
> on the
>     > > RFC.
>     > > >> >> >>>
>     > > >> >> >>> Best,
>     > > >> >> >>> Gary Li
>     > > >> >> >>> ________________________________
>     > > >> >> >>> From: Danny Chan <danny0...@apache.org>
>     > > >> >> >>> Sent: Tuesday, January 5, 2021 10:22 AM
>     > > >> >> >>> To: dev@hudi.apache.org <dev@hudi.apache.org>
>     > > >> >> >>> Subject: Re: [DISCUSS] New Flink Writer Proposal
>     > > >> >> >>>
>     > > >> >> >>> Sure, i can update the RFC-13 cwiki if you agree with
> that.
>     > > >> >> >>>
>     > > >> >> >>> Vinoth Chandar <vin...@apache.org> 于2021年1月5日周二
> 上午2:58写道:
>     > > >> >> >>>
>     > > >> >> >>> > Overall +1 on the idea.
>     > > >> >> >>> >
>     > > >> >> >>> > Danny, could we move this to the apache cwiki if you
> don't
>     > > mind?
>     > > >> >> >>> > That's what we have been using for other RFC
> discussions.
>     > > >> >> >>> >
>     > > >> >> >>> > On Mon, Jan 4, 2021 at 1:22 AM Danny Chan <
>     > > danny0...@apache.org>
>     > > >> >> wrote:
>     > > >> >> >>> >
>     > > >> >> >>> > > The RFC-13 Flink writer has some bottlenecks that
> make it
>     > > hard
>     > > >> to
>     > > >> >> >>> adapter
>     > > >> >> >>> > > to production:
>     > > >> >> >>> > >
>     > > >> >> >>> > > - The InstantGeneratorOperator is parallelism 1,
> which is a
>     > > >> limit
>     > > >> >> for
>     > > >> >> >>> > > high-throughput consumption; because all the split
> inputs
>     > > drain
>     > > >> >> to a
>     > > >> >> >>> > single
>     > > >> >> >>> > > thread, the network IO would gains pressure too
>     > > >> >> >>> > > - The WriteProcessOperator handles inputs by
> partition,
>     > that
>     > > >> >> means,
>     > > >> >> >>> > within
>     > > >> >> >>> > > each partition write process, the BUCKETs are
> written one
>     > by
>     > > >> one,
>     > > >> >> the
>     > > >> >> >>> > FILE
>     > > >> >> >>> > > IO is limit to adapter to high-throughput inputs
>     > > >> >> >>> > > - It buffers the data by checkpoints, which is too
> hard to
>     > be
>     > > >> >> robust
>     > > >> >> >>> for
>     > > >> >> >>> > > production, the checkpoint function is blocking and
> should
>     > > not
>     > > >> >> have IO
>     > > >> >> >>> > > operations.
>     > > >> >> >>> > > - The FlinkHoodieIndex is only valid for a per-job
> scope,
>     > it
>     > > >> does
>     > > >> >> not
>     > > >> >> >>> > work
>     > > >> >> >>> > > for existing bootstrap data or for different Flink
> jobs
>     > > >> >> >>> > >
>     > > >> >> >>> > > Thus, here I propose a new design for the Flink
> writer to
>     > > solve
>     > > >> >> these
>     > > >> >> >>> > > problems[1]. Overall, the new design tries to remove
> the
>     > > single
>     > > >> >> >>> > parallelism
>     > > >> >> >>> > > operators and make the index more powerful and
> scalable.
>     > > >> >> >>> > >
>     > > >> >> >>> > > I plan to solve these bottlenecks incrementally (4
> steps),
>     > > >> there
>     > > >> >> are
>     > > >> >> >>> > > already some local POCs for these proposals.
>     > > >> >> >>> > >
>     > > >> >> >>> > > I'm looking forward to your feedback. Any
> suggestions are
>     > > >> >> appreciated
>     > > >> >> >>> ~
>     > > >> >> >>> > >
>     > > >> >> >>> > > [1]
>     > > >> >> >>> > >
>     > > >> >> >>> > >
>     > > >> >> >>> >
>     > > >> >> >>>
>     > > >> >>
>     > > >>
>     > >
>     >
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I%2Fedit%3Fusp%3Dsharing&amp;data=04%7C01%7C%7C5ce33f4da3b7421d096108d8b3057bc8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637456183424601900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pf9Ep8D7x2Q7Es7g0dZd1hJxE5ib0LYtVTSiz2COVzw%3D&amp;reserved=0
>     > > >> >> >>> > >
>     > > >> >> >>> >
>     > > >> >> >>>
>     > > >> >> >>
>     > > >> >>
>     > > >> >
>     > > >>
>     > > >
>     > >
>     >
>
>

Reply via email to