Hello, I'm an OpenLineage committer - and previously, a minor Flink contributor. OpenLineage community is very interested in conversation about Flink metadata, and we'll be happy to cooperate with the Flink community.
Best, Maciej Obuchowski czw., 13 sty 2022 o 18:12 Martijn Visser <mart...@ververica.com> napisaĆ(a): > > Hi all, > > @Andrew thanks for sharing that! > > @Tero good point, I should have clarified the purpose. I want to understand > what "metadata platforms" tools are used or evaluated by the Flink > community, what's their purpose for using such a tool (is it as a generic > catalogue, as a data discovery tool, is lineage the important part etc) and > what problems are people trying to solve with them. This space is > developing rapidly and there are many open source and commercial tools > popping up/growing, which is also why I'm trying to keep an open vision on > how this space is evolving. > > If the Flink community wants to integrate with metadata tools, I fully > agree that ideally we do that via standards. My perception is at this > moment that no clear standard has yet been established. You mentioned > open-metadata.org, but I believe https://openlineage.io/ is also an > alternative standard. > > Best regards, > > Martijn > > On Thu, 13 Jan 2022 at 17:00, Tero Paananen <teropaana...@gmail.com> wrote: > > > > I'm currently checking out different metadata platforms, such as > > Amundsen [1] and Datahub [2]. In short, these types of tools try to address > > problems related to topics such as data discovery, data lineage and an > > overall data catalogue. > > > > > > I'm reaching out to the Dev and User mailing lists to get some feedback. > > It would really help if you could spend a couple of minutes to let me know > > if you already use either one of the two mentioned metadata platforms or > > another one, or are you evaluating such tools? If so, is that for the > > purpose as a catalogue, for lineage or anything else? Any type of feedback > > on these types of tools is appreciated. > > > > I hope you don't mind answers off-list. > > > > You didn't say what purpose you're evaluating these tools for, but if > > you're evaluating platforms for integration with Flink, I wouldn't > > approach it with a particular product in mind. Rather I'd create some > > sort of facility to propagate metadata and/or lineage information in a > > generic way and allow Flink users to plug in their favorite metadata > > tool. Using standards like OpenLineage, for example. I believe Egeria > > is also trying to create an open standard for metadata.; > > > > If you're evaluating data catalogs for personal use or use in a > > particular project, Andrew's answer about the Wikimedia evaluation is > > a good start. It's missing OpenMetadata (https://open-metadata.org/). > > That one is showing a LOT of promise. Wikimedia's evaluation is also > > missing industry leading commercial products (understandably, given > > their mission). Collibra and Alation probably the ones that pop up > > most often. > > > > I have personally looked into both DataHub and Amundsen. My high level > > feedback is that DataHub is overengineered, and using proprietary > > LinkedIn technology platform(s), which aren't widely used anywhere. > > Amundsen is much less flexible than DataHub and quite basic in its > > functionality. If you need anything beyond what it already offers, > > good luck. > > > > We dumped Amundsen in favor of OpenMetadata a few months back. We > > don't have enough data points to fully evaluate OpenMetadata yet. > > > > -TPP > >