Re: Iceberg or deltalake table as input for drill queries

luoc Mon, 05 Jul 2021 08:05:40 -0700

Hi z0ltrix,
  There are two links to contribute the ideas. [1] is the issues collector for 
the anything (Recommended). [2] is the guideline of contribution. Enjoy


[1] Github Issues <https://github.com/apache/drill/issues>
[2] Guideline Notice <https://github.com/apache/drill/issues/2233>

> 2021年7月5日 上午12:12，Christian Pfarr <[email protected]> 写道：
> 
> Hi luoc,
> 
> 
> of course. I would be happy to support you with this.
> 
> How to start?
> 
> 
> Regards,
> 
> z0ltrix
> 
> 
> 
> 
> 
> 
> -------- Original-Nachricht --------
> Am 4. Juli 2021, 17:37, luoc schrieb:
> 
> 
> Hi,
> Makes perfect sense so far. Obviously, you understand the difference between 
> batch computation and Ad-Hoc. At the same time, Drill is a high-performance 
> MPP query layer for self describing data, schema-free and ANSI SQL.
> Would you mind helping me open an issue on the Github? Is a good way to 
> initiate the technical discussion.
> 
> > 在 2021年7月4日，02:54，Christian Pfarr <[email protected]> 写道：
> > Hi luoc,
> >
> >
> > thanks for the information.
> >
> >
> > I think this kind of storage format is used more and more in cloud 
> > architectures because it departments wants to use as less tools as possible 
> > to provide a big data product. With iceberg they can build consistant and 
> > scalable big data structures for stream and batch processing at the same 
> > storage layer with a single tool, Spark.
> >
> >
> > The problem is how to provide the data to customers. In my opinion Spark 
> > itself is too slow for interactive querying by a lot of people or BI Tools. 
> > Thats the point where Tools like Presto, Drill or Dremio enters the stage.
> >
> >
> > I would like to see Drill as competitor in this area, especially because of 
> > the brilliant flexible and schemaless design.
> >
> >
> > If the Iceberg implementation is already done for metastore and you are 
> > already experienced with its internals, it sounds worth to invest the time 
> > and energy for a new format plugin.
> >
> >
> > Just the opinion of an consultant who wants to recommend drill for this 
> > usecases ;)
> >
> >
> > Regards
> >
> > z0ltrix
> >
> >
> >
> >
> >
> >
> >
> > -------- Original-Nachricht --------
> > Am 3. Juli 2021, 16:55, luoc schrieb:
> >
> > Hello,
> > Thanks for the interest. Drill’s Metastore allows to use a storage engine 
> > based on Iceberg tables. But now, It seems that Drill does not support the 
> > data of Iceberg for query. I will tell you that Drill can definitely 
> > support Iceberg, including readable and writeable. The condition is that we 
> > need to develop the format plugin using the "Easy framework based on EVF". 
> > Please let me know if you are interested in the that.
> >
> > > 2021年7月3日 上午2:41，Christian Pfarr <[email protected]> 写道：
> > >
> > > Hello everyone,
> > >
> > >
> > > it looks like more and more people are using deltalake or iceberg in 
> > > spark for transactional working with big tables.
> > >
> > >
> > > Additionally i saw that drill is using iceberg as storage engine for 
> > > metadata.
> > >
> > >
> > > So, i wonder if its possible to query iceberg tables stored in hdfs or s3 
> > > directly via drill so that i can process my data with spark iceberg 
> > > tables and present them with drill to my data scientists.
> > >
> > >
> > > Regards,
> > >
> > > z0ltrix
> > >
> > >
> > >
> > >
> > >
> > >
> > > <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>
> >
> > <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>
> 
> <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>

Re: Iceberg or deltalake table as input for drill queries

Reply via email to