On 2/27/2020 1:04:07 PM, Nicolas PARIS <[email protected]> wrote:
> However, updating parquet files can be a bit troublesome. You might be interested in delta-lake which provides an implementation of the sql merge statement on top of parquet files. Implementing a drill connector on this should be feasible. This could be used together the hybrid design described by Ted and Paul - and makes parquet be more than static archive. https://docs.delta.io/latest/delta-intro.html I noticed Drill already has some support for IceBerg, but I am not familiar enough with Spark to figure out whether Delta Lake and Iceberg can be run without a Hadoop HDFS. I was hoping to avoid a full Hadoop deployment since Drill itself runs fine without it.
