Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Anton Okolnychyi
I would also support adding this to Iceberg itself. I think we have a use case where we can leverage this. @Ryan, could you also provide more info on the audit process? Thanks, Anton > On 20 Jul 2019, at 04:01, RD wrote: > > I think this could be useful. When we ingest data from Kafka, we do

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Ryan Blue
Audits run on the snapshot by setting the snapshot-id read option to read the WAP snapshot, even though it has not (yet) been the current table state. This is documented in the time travel section of the Iceberg site. We added a stageOnly method to Sn

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-22 Thread Daniel Weeks
Hey Gautam, We also have a couple people looking into vectorized reading (into Arrow memory). I think it would be good for us to get together and see if we can collaborate on a common approach for this. I'll reach out directly and see if we can get together. -Dan On Sun, Jul 21, 2019 at 10:35

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Filip
This definitely sounds interesting. Quick question on whether this presents impact on the current Upserts spec? Or is it maybe that we are looking to associate this support for append-only commits? On Mon, Jul 22, 2019 at 6:51 PM Ryan Blue wrote: > Audits run on the snapshot by setting the snaps

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-22 Thread Gautam
That would be great! On Mon, Jul 22, 2019 at 9:12 AM Daniel Weeks wrote: > Hey Gautam, > > We also have a couple people looking into vectorized reading (into Arrow > memory). I think it would be good for us to get together and see if we can > collaborate on a common approach for this. > > I'll

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Mouli Mukherjee
This would be super helpful. We have a similar workflow where we do some validation before letting the downstream consume the changes. Best, Mouli On Mon, Jul 22, 2019 at 9:18 AM Filip wrote: > This definitely sounds interesting. Quick question on whether this > presents impact on the current U

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Edgar Rodriguez
I think this use case is pretty helpful in most data environments, we do the same sort of stage-check-publish pattern to run quality checks. One question is, if say the audit part fails, is there a way to expire the snapshot or what would be the workflow that follows? Best, Edgar On Mon, Jul 22,

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-22 Thread Matt Cheah
Would it be possible to put the work in progress code in open source? From: Gautam Reply-To: "dev@iceberg.apache.org" Date: Monday, July 22, 2019 at 9:46 AM To: Daniel Weeks Cc: Ryan Blue , Iceberg Dev List Subject: Re: Approaching Vectorized Reading in Iceberg .. That would be great!

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-22 Thread Gautam
Will do. Doing a bit of housekeeping on the code and also adding more primitive type support. On Mon, Jul 22, 2019 at 1:41 PM Matt Cheah wrote: > Would it be possible to put the work in progress code in open source? > > > > *From: *Gautam > *Reply-To: *"dev@iceberg.apache.org" > *Date: *Monday