Hello,

I'm an OpenLineage committer - and previously, a minor Flink contributor.
OpenLineage community is very interested in conversation about Flink
metadata, and we'll be happy to cooperate with the Flink community.

Best,
Maciej Obuchowski



czw., 13 sty 2022 o 18:12 Martijn Visser <mart...@ververica.com> napisaƂ(a):
>
> Hi all,
>
> @Andrew thanks for sharing that!
>
> @Tero good point, I should have clarified the purpose. I want to understand
> what "metadata platforms" tools are used or evaluated by the Flink
> community, what's their purpose for using such a tool (is it as a generic
> catalogue, as a data discovery tool, is lineage the important part etc) and
> what problems are people trying to solve with them. This space is
> developing rapidly and there are many open source and commercial tools
> popping up/growing, which is also why I'm trying to keep an open vision on
> how this space is evolving.
>
> If the Flink community wants to integrate with metadata tools, I fully
> agree that ideally we do that via standards. My perception is at this
> moment that no clear standard has yet been established. You mentioned
> open-metadata.org, but I believe https://openlineage.io/ is also an
> alternative standard.
>
> Best regards,
>
> Martijn
>
> On Thu, 13 Jan 2022 at 17:00, Tero Paananen <teropaana...@gmail.com> wrote:
>
> > > I'm currently checking out different metadata platforms, such as
> > Amundsen [1] and Datahub [2]. In short, these types of tools try to address
> > problems related to topics such as data discovery, data lineage and an
> > overall data catalogue.
> > >
> > > I'm reaching out to the Dev and User mailing lists to get some feedback.
> > It would really help if you could spend a couple of minutes to let me know
> > if you already use either one of the two mentioned metadata platforms or
> > another one, or are you evaluating such tools? If so, is that for the
> > purpose as a catalogue, for lineage or anything else? Any type of feedback
> > on these types of tools is appreciated.
> >
> > I hope you don't mind answers off-list.
> >
> > You didn't say what purpose you're evaluating these tools for, but if
> > you're evaluating platforms for integration with Flink, I wouldn't
> > approach it with a particular product in mind. Rather I'd create some
> > sort of facility to propagate metadata and/or lineage information in a
> > generic way and allow Flink users to plug in their favorite metadata
> > tool. Using standards like OpenLineage, for example. I believe Egeria
> > is also trying to create an open standard for metadata.;
> >
> > If you're evaluating data catalogs for personal use or use in a
> > particular project, Andrew's answer about the Wikimedia evaluation is
> > a good start. It's missing OpenMetadata (https://open-metadata.org/).
> > That one is showing a LOT of promise. Wikimedia's evaluation is also
> > missing industry leading commercial products (understandably, given
> > their mission). Collibra and Alation probably the ones that pop up
> > most often.
> >
> > I have personally looked into both DataHub and Amundsen. My high level
> > feedback is that DataHub is overengineered, and using proprietary
> > LinkedIn technology platform(s), which aren't widely used anywhere.
> > Amundsen is much less flexible than DataHub and quite basic in its
> > functionality. If you need anything beyond what it already offers,
> > good luck.
> >
> > We dumped Amundsen in favor of OpenMetadata a few months back. We
> > don't have enough data points to fully evaluate OpenMetadata yet.
> >
> > -TPP
> >

Reply via email to