Re: Iceberg or deltalake table as input for drill queries

luoc Sun, 04 Jul 2021 08:37:57 -0700


Hi,
  Makes perfect sense so far. Obviously, you understand the difference between 
batch computation and Ad-Hoc. At the same time, Drill is a high-performance MPP 
query layer for self describing data, schema-free and ANSI SQL.
  Would you mind helping me open an issue on the Github? Is a good way to 
initiate the technical discussion.


> 在 2021年7月4日，02:54，Christian Pfarr <[email protected]> 写道：
> Hi luoc,
> 
> 
> thanks for the information.
> 
> 
> I think this kind of storage format is used more and more in cloud 
> architectures because it departments wants to use as less tools as possible 
> to provide a big data product. With iceberg they can build consistant and 
> scalable big data structures for stream and batch processing at the same 
> storage layer with a single tool, Spark.
> 
> 
> The problem is how to provide the data to customers. In my opinion Spark 
> itself is too slow for interactive querying by a lot of people or BI Tools. 
> Thats the point where Tools like Presto, Drill or Dremio enters the stage.
> 
> 
> I would like to see Drill as competitor in this area, especially because of 
> the brilliant flexible and schemaless design.
> 
> 
> If the Iceberg implementation is already done for metastore and you are 
> already experienced with its internals, it sounds worth to invest the time 
> and energy for a new format plugin.
> 
> 
> Just the opinion of an consultant who wants to recommend drill for this 
> usecases ;)
> 
> 
> Regards
> 
> z0ltrix
> 
> 
> 
> 
> 
> 
> 
> -------- Original-Nachricht --------
> Am 3. Juli 2021, 16:55, luoc schrieb:
> 
> Hello,
> Thanks for the interest. Drill’s Metastore allows to use a storage engine 
> based on Iceberg tables. But now, It seems that Drill does not support the 
> data of Iceberg for query. I will tell you that Drill can definitely support 
> Iceberg, including readable and writeable. The condition is that we need to 
> develop the format plugin using the "Easy framework based on EVF". Please let 
> me know if you are interested in the that.
> 
> > 2021年7月3日 上午2:41，Christian Pfarr <[email protected]> 写道：
> >
> > Hello everyone,
> >
> >
> > it looks like more and more people are using deltalake or iceberg in spark 
> > for transactional working with big tables.
> >
> >
> > Additionally i saw that drill is using iceberg as storage engine for 
> > metadata.
> >
> >
> > So, i wonder if its possible to query iceberg tables stored in hdfs or s3 
> > directly via drill so that i can process my data with spark iceberg tables 
> > and present them with drill to my data scientists.
> >
> >
> > Regards,
> >
> > z0ltrix
> >
> >
> >
> >
> >
> >
> > <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>
> 
> <publickey - EmailAddress([email protected]) - 0xF0E154C5.asc>

Re: Iceberg or deltalake table as input for drill queries

Reply via email to