Re: Iceberg scans not keeping or using important file/column statistics in manifests ..

2019-03-05 Thread Gautam
Thanks for the response Ryan, comments in line ... > Iceberg doesn't support binding expressions in sub-structs yet. So the fix on the Iceberg side requires a few steps. First, collecting the metrics from Parquet with Anton's PR, and second, updating expression binding to work with structs. I don

Re: Iceberg scans not keeping or using important file/column statistics in manifests ..

2019-03-05 Thread Anton Okolnychyi
Sorry for my late reply and thanks for testing Gautam! I had a local prototype that only collected metrics for nested structs and stored them. I haven’t checked if Iceberg can make use of that right now. As I understand Ryan’s comment and Gautam’s observations, we will need changes to make it w

Re: Iceberg scans not keeping or using important file/column statistics in manifests ..

2019-03-05 Thread Gautam
Hey Anton, I'm curious how you are using the Struct metrics in your company, are you planning to use it for predicate pushdowns or something else entirely? Regarding timeline, that's fine, we can wait a week or two for your changes on collecting metrics. If I can assume that your changes wi

Re: Bloom Filters?

2019-03-05 Thread Ryan Blue
+Iceberg Dev List , the project has moved to Apache. Hi Josh, We have considered bloom filters. There are some cases where they could be useful, but there are generally better ways to accomplish the same task. I typically recommend sorting on an ID field to take advantage of the lower and upper

Option to disable rewrites of IN predicates

2019-03-05 Thread Anton Okolnychyi
Hey, Iceberg Spark data source rewrites IN predicates as a mix of OR/EQ. I am wondering if it makes sense to introduce a threshold when this rewrite happens until [1] is resolved. We can have something similar to “spark.sql.parquet.pushdown.inFilterThreshold” in Spark. We have experienced a pe

Re: Option to disable rewrites of IN predicates

2019-03-05 Thread Ryan Blue
Would it make sense to add support for IN expressions instead? I'd rather get that done than build work-arounds. On Tue, Mar 5, 2019 at 10:33 AM Anton Okolnychyi wrote: > Hey, > > Iceberg Spark data source rewrites IN predicates as a mix of OR/EQ. I am > wondering if it makes sense to introduce

Re: Iceberg scans not keeping or using important file/column statistics in manifests ..

2019-03-05 Thread Anton Okolnychyi
Sounds good, Gautam. Our intention was to be able to filter out files using predicates on nested fields. For now, file skipping works only with predicates that involve top level attributes. > On 5 Mar 2019, at 17:47, Gautam wrote: > > Hey Anton, >I'm curious how you are using the S

Re: Podling Report Reminder - March 2019

2019-03-05 Thread Justin Mclean
Hi, Just a reminder that this is due today and you can fill out the report here. [1] If for some reason you can't submit the report on time you be asked to report next month. Thanks, Justin 1. https://wiki.apache.org/incubator/March2019

Re: [DISCUSS] Python implementation

2019-03-05 Thread Ryan Blue
We may want to do that eventually, but I think it isn't necessary right now. I also don't know how we would determine what must be implemented to form that minimum support threshold. Not having features means not being able to read tables that use them. If there is no Parquet support, then you can'

Re: [DISCUSS] Python implementation

2019-03-05 Thread Ted Gooch
I'll start work on making a table with currently supported operations. There are a lot of gaps right now, but I'd like to start closing out some of the most important ones in the coming weeks/months. On Tue, Mar 5, 2019 at 4:11 PM Ryan Blue wrote: > We may want to do that eventually, but I think

[VOTE] Add the python implementation

2019-03-05 Thread Ryan Blue
Hi everyone, I'd like to propose accepting the current Python PR, #54 . It has been through a round of reviews and while there is certainly more to do, it is going to be much easier to improve the implementation with smaller pull requests than t

Re: [VOTE] Add the python implementation

2019-03-05 Thread Ryan Blue
+1 On Tue, Mar 5, 2019 at 4:18 PM Ryan Blue wrote: > Hi everyone, > > I'd like to propose accepting the current Python PR, #54 > . It has been > through a round of reviews and while there is certainly more to do, it is > going to be much easie

Re: [VOTE] Add the python implementation

2019-03-05 Thread Ted Gooch
+1 On Tue, Mar 5, 2019 at 4:19 PM Ryan Blue wrote: > +1 > > On Tue, Mar 5, 2019 at 4:18 PM Ryan Blue wrote: > >> Hi everyone, >> >> I'd like to propose accepting the current Python PR, #54 >> . It has been >> through a round of reviews and wh

Re: [VOTE] Add the python implementation

2019-03-05 Thread Xabriel Collazo Mojica
+1 Xabriel J Collazo Mojica | Senior Software Engineer | Adobe | xcoll...@adobe.com From: Ted Gooch Reply-To: "dev@iceberg.apache.org" Date: Tuesday, March 5, 2019 at 4:21 PM To: "dev@iceberg.apache.org" Cc: Ryan Blue Subject: Re: [VOTE] Add the python implementation +1 On Tue, Mar 5

Re: [VOTE] Add the python implementation

2019-03-05 Thread John Zhuge
+1 On Tue, Mar 5, 2019 at 4:59 PM Xabriel Collazo Mojica wrote: > +1 > > > > *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe | > xcoll...@adobe.com > > > > *From: *Ted Gooch > *Reply-To: *"dev@iceberg.apache.org" > *Date: *Tuesday, March 5, 2019 at 4:21 PM > *To: *"dev@icebe

Re: [VOTE] Add the python implementation

2019-03-05 Thread RD
+1 On Tue, Mar 5, 2019 at 5:01 PM John Zhuge wrote: > +1 > > On Tue, Mar 5, 2019 at 4:59 PM Xabriel Collazo Mojica > wrote: > >> +1 >> >> >> >> *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe | >> xcoll...@adobe.com >> >> >> >> *From: *Ted Gooch >> *Reply-To: *"dev@iceberg.a

Re: [VOTE] Add the python implementation

2019-03-05 Thread Romin Parekh
+1 Sent from my iPhone > On Mar 5, 2019, at 5:26 PM, RD wrote: > > +1 > >> On Tue, Mar 5, 2019 at 5:01 PM John Zhuge wrote: >> +1 >> >>> On Tue, Mar 5, 2019 at 4:59 PM Xabriel Collazo Mojica >>> wrote: >>> +1 >>> >>> >>> >>> Xabriel J Collazo Mojica | Senior Software Engineer | Ado

Re: [VOTE] Add the python implementation

2019-03-05 Thread Daniel Weeks
+1 On Tue, Mar 5, 2019, 5:26 PM RD wrote: > +1 > > On Tue, Mar 5, 2019 at 5:01 PM John Zhuge wrote: > >> +1 >> >> On Tue, Mar 5, 2019 at 4:59 PM Xabriel Collazo Mojica >> wrote: >> >>> +1 >>> >>> >>> >>> *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe | >>> xcoll...@adobe.co

Re: [VOTE] Add the python implementation

2019-03-05 Thread Gautam Kowshik
+1 Sent from my iPhone > On Mar 6, 2019, at 6:56 AM, RD wrote: > > +1 > >> On Tue, Mar 5, 2019 at 5:01 PM John Zhuge wrote: >> +1 >> >>> On Tue, Mar 5, 2019 at 4:59 PM Xabriel Collazo Mojica >>> wrote: >>> +1 >>> >>> >>> >>> Xabriel J Collazo Mojica | Senior Software Engineer | Ado

Re: [VOTE] Add the python implementation

2019-03-05 Thread Tao Feng
+1 On Tue, Mar 5, 2019 at 7:18 PM Gautam Kowshik wrote: > +1 > > Sent from my iPhone > > On Mar 6, 2019, at 6:56 AM, RD wrote: > > +1 > > On Tue, Mar 5, 2019 at 5:01 PM John Zhuge wrote: > >> +1 >> >> On Tue, Mar 5, 2019 at 4:59 PM Xabriel Collazo Mojica < >> xcoll...@adobe.com.invalid> wrote:

Re: [VOTE] Add the python implementation

2019-03-05 Thread Arthur Wiedmer
+1 (non-binding) On Tue, Mar 5, 2019, 16:19 Ryan Blue wrote: > Hi everyone, > > I'd like to propose accepting the current Python PR, #54 > . It has been > through a round of reviews and while there is certainly more to do, it is > going to be