Re: Iceberg or deltalake table as input for drill queries

Christian Pfarr Tue, 06 Jul 2021 12:09:02 -0700

Hi Charles,

i've opened an github issue.


https://github.com/apache/drill/issues/2269 


Hope this helps and would like to discuss the details with you.

Regards,
z0ltrix

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

Charles Givre <[email protected]> schrieb am Dienstag, 6. Juli 2021 um 15:25:

> It sounds like there is interest in developing a storage plugin for Drill to 
> query Apache Iceberg. We've actually discussed this internally as well. We 
> could start looking into this.
> 

> -- C
> 

> > On Jul 5, 2021, at 11:05 AM, luoc [email protected] wrote:
> > 

> > Hi z0ltrix,
> > 

> > There are two links to contribute the ideas. [1] is the issues collector 
> > for the anything (Recommended). [2] is the guideline of contribution. Enjoy
> > 

> > [1] Github Issues https://github.com/apache/drill/issues
> > 

> > [2] Guideline Notice https://github.com/apache/drill/issues/2233
> > 

> > > 2021年7月5日 上午12:12，Christian Pfarr [email protected] 写道：
> > > 

> > > Hi luoc,
> > > 

> > > of course. I would be happy to support you with this.
> > > 

> > > How to start?
> > > 

> > > Regards,
> > > 

> > > z0ltrix
> > > 

> > > -------- Original-Nachricht --------
> > > 

> > > Am 4. Juli 2021, 17:37, luoc schrieb:
> > > 

> > > Hi,
> > > 

> > > Makes perfect sense so far. Obviously, you understand the difference 
> > > between batch computation and Ad-Hoc. At the same time, Drill is a 
> > > high-performance MPP query layer for self describing data, schema-free 
> > > and ANSI SQL.
> > > 

> > > Would you mind helping me open an issue on the Github? Is a good way to 
> > > initiate the technical discussion.
> > > 

> > > > 在 2021年7月4日，02:54，Christian Pfarr [email protected] 写道：
> > > > 

> > > > Hi luoc,
> > > > 

> > > > thanks for the information.
> > > > 

> > > > I think this kind of storage format is used more and more in cloud 
> > > > architectures because it departments wants to use as less tools as 
> > > > possible to provide a big data product. With iceberg they can build 
> > > > consistant and scalable big data structures for stream and batch 
> > > > processing at the same storage layer with a single tool, Spark.
> > > > 

> > > > The problem is how to provide the data to customers. In my opinion 
> > > > Spark itself is too slow for interactive querying by a lot of people or 
> > > > BI Tools. Thats the point where Tools like Presto, Drill or Dremio 
> > > > enters the stage.
> > > > 

> > > > I would like to see Drill as competitor in this area, especially 
> > > > because of the brilliant flexible and schemaless design.
> > > > 

> > > > If the Iceberg implementation is already done for metastore and you are 
> > > > already experienced with its internals, it sounds worth to invest the 
> > > > time and energy for a new format plugin.
> > > > 

> > > > Just the opinion of an consultant who wants to recommend drill for this 
> > > > usecases ;)
> > > > 

> > > > Regards
> > > > 

> > > > z0ltrix
> > > > 

> > > > -------- Original-Nachricht --------
> > > > 

> > > > Am 3. Juli 2021, 16:55, luoc schrieb:
> > > > 

> > > > Hello,
> > > > 

> > > > Thanks for the interest. Drill’s Metastore allows to use a storage 
> > > > engine based on Iceberg tables. But now, It seems that Drill does not 
> > > > support the data of Iceberg for query. I will tell you that Drill can 
> > > > definitely support Iceberg, including readable and writeable. The 
> > > > condition is that we need to develop the format plugin using the "Easy 
> > > > framework based on EVF". Please let me know if you are interested in 
> > > > the that.
> > > > 

> > > > > 2021年7月3日 上午2:41，Christian Pfarr [email protected] 写道：
> > > > > 

> > > > > Hello everyone,
> > > > > 

> > > > > it looks like more and more people are using deltalake or iceberg in 
> > > > > spark for transactional working with big tables.
> > > > > 

> > > > > Additionally i saw that drill is using iceberg as storage engine for 
> > > > > metadata.
> > > > > 

> > > > > So, i wonder if its possible to query iceberg tables stored in hdfs 
> > > > > or s3 directly via drill so that i can process my data with spark 
> > > > > iceberg tables and present them with drill to my data scientists.
> > > > > 

> > > > > Regards,
> > > > > 

> > > > > z0ltrix
> > > > > 

> > > > > <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>
> > > > 

> > > > <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>
> > > 

> > > <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>

publickey - [email protected] - 0xF0E154C5.asc
Description: application/pgp-keys

signature.asc
Description: OpenPGP digital signature

Re: Iceberg or deltalake table as input for drill queries

Reply via email to