Re: Iceberg or deltalake table as input for drill queries

Charles Givre Tue, 06 Jul 2021 06:26:16 -0700

It sounds like there is interest in developing a storage plugin for Drill to 
query Apache Iceberg.  We've actually discussed this internally as well.  We 
could start looking into this.
-- C


> On Jul 5, 2021, at 11:05 AM, luoc <[email protected]> wrote:
> 
> Hi z0ltrix,
>  There are two links to contribute the ideas. [1] is the issues collector for 
> the anything (Recommended). [2] is the guideline of contribution. Enjoy
> 
> [1] Github Issues <https://github.com/apache/drill/issues>
> [2] Guideline Notice <https://github.com/apache/drill/issues/2233>
> 
>> 2021年7月5日 上午12:12，Christian Pfarr <[email protected]> 写道：
>> 
>> Hi luoc,
>> 
>> 
>> of course. I would be happy to support you with this.
>> 
>> How to start?
>> 
>> 
>> Regards,
>> 
>> z0ltrix
>> 
>> 
>> 
>> 
>> 
>> 
>> -------- Original-Nachricht --------
>> Am 4. Juli 2021, 17:37, luoc schrieb:
>> 
>> 
>> Hi,
>> Makes perfect sense so far. Obviously, you understand the difference between 
>> batch computation and Ad-Hoc. At the same time, Drill is a high-performance 
>> MPP query layer for self describing data, schema-free and ANSI SQL.
>> Would you mind helping me open an issue on the Github? Is a good way to 
>> initiate the technical discussion.
>> 
>>> 在 2021年7月4日，02:54，Christian Pfarr <[email protected]> 写道：
>>> Hi luoc,
>>> 
>>> 
>>> thanks for the information.
>>> 
>>> 
>>> I think this kind of storage format is used more and more in cloud 
>>> architectures because it departments wants to use as less tools as possible 
>>> to provide a big data product. With iceberg they can build consistant and 
>>> scalable big data structures for stream and batch processing at the same 
>>> storage layer with a single tool, Spark.
>>> 
>>> 
>>> The problem is how to provide the data to customers. In my opinion Spark 
>>> itself is too slow for interactive querying by a lot of people or BI Tools. 
>>> Thats the point where Tools like Presto, Drill or Dremio enters the stage.
>>> 
>>> 
>>> I would like to see Drill as competitor in this area, especially because of 
>>> the brilliant flexible and schemaless design.
>>> 
>>> 
>>> If the Iceberg implementation is already done for metastore and you are 
>>> already experienced with its internals, it sounds worth to invest the time 
>>> and energy for a new format plugin.
>>> 
>>> 
>>> Just the opinion of an consultant who wants to recommend drill for this 
>>> usecases ;)
>>> 
>>> 
>>> Regards
>>> 
>>> z0ltrix
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -------- Original-Nachricht --------
>>> Am 3. Juli 2021, 16:55, luoc schrieb:
>>> 
>>> Hello,
>>> Thanks for the interest. Drill’s Metastore allows to use a storage engine 
>>> based on Iceberg tables. But now, It seems that Drill does not support the 
>>> data of Iceberg for query. I will tell you that Drill can definitely 
>>> support Iceberg, including readable and writeable. The condition is that we 
>>> need to develop the format plugin using the "Easy framework based on EVF". 
>>> Please let me know if you are interested in the that.
>>> 
>>>> 2021年7月3日 上午2:41，Christian Pfarr <[email protected]> 写道：
>>>> 
>>>> Hello everyone,
>>>> 
>>>> 
>>>> it looks like more and more people are using deltalake or iceberg in spark 
>>>> for transactional working with big tables.
>>>> 
>>>> 
>>>> Additionally i saw that drill is using iceberg as storage engine for 
>>>> metadata.
>>>> 
>>>> 
>>>> So, i wonder if its possible to query iceberg tables stored in hdfs or s3 
>>>> directly via drill so that i can process my data with spark iceberg tables 
>>>> and present them with drill to my data scientists.
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> z0ltrix
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>
>>> 
>>> <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>
>> 
>> <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>
>

Re: Iceberg or deltalake table as input for drill queries

Reply via email to