Re: [DISCUSS] Write-audit-publish support

2019-07-31 Thread Ryan Blue
Hi everyone, I've added PR #342 to the Iceberg repository with our WAP changes. Please have a look if you were interested in this. On Mon, Jul 22, 2019 at 11:05 AM Edgar Rodriguez wrote: > I think this use case is pretty helpful in most data

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-31 Thread Gautam
Ah yes, I didn't send over the filter benchmarks .. Num files : 500 Num rows per file: 10,000 *Benchmark Mode Cnt Score Error Units* IcebergSourceFlatParquetDataFilterBenchmark.readWithFilterFileSourceNonVectorized ss5 3.837 ± 0.424 s/op IcebergSourceFlatParquetDataFilterB

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-31 Thread Anjali Norwood
Hi Gautam, You wrote: ' - The filters are not being applied in columnar fashion they are being applied row by row as in Iceberg each filter visitor is stateless and applied separately on each row's column. ' .. this should not be a problem for this particular benchmark as IcebergSourceFlatParquetD

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-31 Thread Gautam
Also I think the other thing that's fundamentally different is the way Page iteration and Column iteration are done in Iceberg vs. the way value reading happens in Spark's ValuesReader implementations. On Wed, Jul 31, 2019 at 1:44 PM Gautam wrote: > Hey Samarth, > Sorry bout the de

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-31 Thread Gautam
Hey Samarth, Sorry bout the delay. I ran into some bottlenecks for which I had to add more code to be able to run benchmarks. I'v checked in my latest changes to my fork's *vectorized-read* branch [0]. Here's the early numbers on the initial implementation... *Benchmark Data:* - 10

Re: Orphan manifest file when performing delete in transaction

2019-07-31 Thread Arina Ielchiieva
Ryan, thanks for the detailed answer. I'll try out the suggested approach and post results in the Issue #330. Kind regards, Arina On Wed, Jul 31, 2019 at 12:02 AM Ryan Blue wrote: > Hi Arina, thanks for reporting this issue, and for the thorough write-up > on that issue! > > I suspect that this