Hi all,

@Andrew thanks for sharing that!

@Tero good point, I should have clarified the purpose. I want to understand
what "metadata platforms" tools are used or evaluated by the Flink
community, what's their purpose for using such a tool (is it as a generic
catalogue, as a data discovery tool, is lineage the important part etc) and
what problems are people trying to solve with them. This space is
developing rapidly and there are many open source and commercial tools
popping up/growing, which is also why I'm trying to keep an open vision on
how this space is evolving.

If the Flink community wants to integrate with metadata tools, I fully
agree that ideally we do that via standards. My perception is at this
moment that no clear standard has yet been established. You mentioned
open-metadata.org, but I believe https://openlineage.io/ is also an
alternative standard.

Best regards,

Martijn

On Thu, 13 Jan 2022 at 17:00, Tero Paananen <teropaana...@gmail.com> wrote:

> > I'm currently checking out different metadata platforms, such as
> Amundsen [1] and Datahub [2]. In short, these types of tools try to address
> problems related to topics such as data discovery, data lineage and an
> overall data catalogue.
> >
> > I'm reaching out to the Dev and User mailing lists to get some feedback.
> It would really help if you could spend a couple of minutes to let me know
> if you already use either one of the two mentioned metadata platforms or
> another one, or are you evaluating such tools? If so, is that for the
> purpose as a catalogue, for lineage or anything else? Any type of feedback
> on these types of tools is appreciated.
>
> I hope you don't mind answers off-list.
>
> You didn't say what purpose you're evaluating these tools for, but if
> you're evaluating platforms for integration with Flink, I wouldn't
> approach it with a particular product in mind. Rather I'd create some
> sort of facility to propagate metadata and/or lineage information in a
> generic way and allow Flink users to plug in their favorite metadata
> tool. Using standards like OpenLineage, for example. I believe Egeria
> is also trying to create an open standard for metadata.;
>
> If you're evaluating data catalogs for personal use or use in a
> particular project, Andrew's answer about the Wikimedia evaluation is
> a good start. It's missing OpenMetadata (https://open-metadata.org/).
> That one is showing a LOT of promise. Wikimedia's evaluation is also
> missing industry leading commercial products (understandably, given
> their mission). Collibra and Alation probably the ones that pop up
> most often.
>
> I have personally looked into both DataHub and Amundsen. My high level
> feedback is that DataHub is overengineered, and using proprietary
> LinkedIn technology platform(s), which aren't widely used anywhere.
> Amundsen is much less flexible than DataHub and quite basic in its
> functionality. If you need anything beyond what it already offers,
> good luck.
>
> We dumped Amundsen in favor of OpenMetadata a few months back. We
> don't have enough data points to fully evaluate OpenMetadata yet.
>
> -TPP
>

Reply via email to