It sounds like there is interest in developing a storage plugin for Drill to query Apache Iceberg. We've actually discussed this internally as well. We could start looking into this. -- C
> On Jul 5, 2021, at 11:05 AM, luoc <[email protected]> wrote: > > Hi z0ltrix, > There are two links to contribute the ideas. [1] is the issues collector for > the anything (Recommended). [2] is the guideline of contribution. Enjoy > > [1] Github Issues <https://github.com/apache/drill/issues> > [2] Guideline Notice <https://github.com/apache/drill/issues/2233> > >> 2021年7月5日 上午12:12,Christian Pfarr <[email protected]> 写道: >> >> Hi luoc, >> >> >> of course. I would be happy to support you with this. >> >> How to start? >> >> >> Regards, >> >> z0ltrix >> >> >> >> >> >> >> -------- Original-Nachricht -------- >> Am 4. Juli 2021, 17:37, luoc schrieb: >> >> >> Hi, >> Makes perfect sense so far. Obviously, you understand the difference between >> batch computation and Ad-Hoc. At the same time, Drill is a high-performance >> MPP query layer for self describing data, schema-free and ANSI SQL. >> Would you mind helping me open an issue on the Github? Is a good way to >> initiate the technical discussion. >> >>> 在 2021年7月4日,02:54,Christian Pfarr <[email protected]> 写道: >>> Hi luoc, >>> >>> >>> thanks for the information. >>> >>> >>> I think this kind of storage format is used more and more in cloud >>> architectures because it departments wants to use as less tools as possible >>> to provide a big data product. With iceberg they can build consistant and >>> scalable big data structures for stream and batch processing at the same >>> storage layer with a single tool, Spark. >>> >>> >>> The problem is how to provide the data to customers. In my opinion Spark >>> itself is too slow for interactive querying by a lot of people or BI Tools. >>> Thats the point where Tools like Presto, Drill or Dremio enters the stage. >>> >>> >>> I would like to see Drill as competitor in this area, especially because of >>> the brilliant flexible and schemaless design. >>> >>> >>> If the Iceberg implementation is already done for metastore and you are >>> already experienced with its internals, it sounds worth to invest the time >>> and energy for a new format plugin. >>> >>> >>> Just the opinion of an consultant who wants to recommend drill for this >>> usecases ;) >>> >>> >>> Regards >>> >>> z0ltrix >>> >>> >>> >>> >>> >>> >>> >>> -------- Original-Nachricht -------- >>> Am 3. Juli 2021, 16:55, luoc schrieb: >>> >>> Hello, >>> Thanks for the interest. Drill’s Metastore allows to use a storage engine >>> based on Iceberg tables. But now, It seems that Drill does not support the >>> data of Iceberg for query. I will tell you that Drill can definitely >>> support Iceberg, including readable and writeable. The condition is that we >>> need to develop the format plugin using the "Easy framework based on EVF". >>> Please let me know if you are interested in the that. >>> >>>> 2021年7月3日 上午2:41,Christian Pfarr <[email protected]> 写道: >>>> >>>> Hello everyone, >>>> >>>> >>>> it looks like more and more people are using deltalake or iceberg in spark >>>> for transactional working with big tables. >>>> >>>> >>>> Additionally i saw that drill is using iceberg as storage engine for >>>> metadata. >>>> >>>> >>>> So, i wonder if its possible to query iceberg tables stored in hdfs or s3 >>>> directly via drill so that i can process my data with spark iceberg tables >>>> and present them with drill to my data scientists. >>>> >>>> >>>> Regards, >>>> >>>> z0ltrix >>>> >>>> >>>> >>>> >>>> >>>> >>>> <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc> >>> >>> <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc> >> >> <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc> >
