Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
No, you can translate these expressions, but you have to evaluate the entire expression. For example: "col1 = 'x' and col2 in (1,2)" becomes col1 = 'x' and col2 in (1,2) "not(col1 = 'x' and col2 in (1,2))" becomes (col1 != 'x' or col2 not in (1,2)) and col1 is not null and col2 is not null

Re: Iceberg V2 Spec

2020-09-18 Thread Ryan Blue
I'm working on an update to the spec. We've completed the Java library implementation end-to-end, so now we have working code that will be released in 0.10.0. Next step is the spec update to document everything now that we're confident that it works as expected. Look for a PR in the next few

Re: Use Iceberg for a time-series data lake

2020-09-18 Thread Ryan Blue
Hi Yi, I think Iceberg could work for you without too much trouble. You might want to look more into partitioning that Iceberg provides. I agree that most users want the storage layer to handle partitioning for them. That's exactly what Iceberg does: it makes data partitioning part of table

Re: Impact on Spark-Iceberg usage on missing to enforce clustering/sort requirement (SPARK-23889)

2020-09-18 Thread Ryan Blue
Jungtaek, I agree with you that we'd ideally get that Spark issue in upstream as soon as we can. I'm currently porting it to our 2.4 build so we can test it out with the new sort orders that were added to Iceberg. Once that's done and I understand the patch a bit better, I'll work on the review

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Ryan Blue
Are you saying that we can't fix this by rewriting expressions to translate from SQL to more natural semantics? On Fri, Sep 18, 2020 at 3:28 PM Owen O'Malley wrote: > In the SQL world, the second point isn't right. It is still the case that > not(equal("col", "x")) is notEqual("col", "x").

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
In the SQL world, the second point isn't right. It is still the case that not(equal("col", "x")) is notEqual("col", "x"). Boolean logic (well, three valued logic) in SQL is just strange relative to programming languages: - null *=* "x" -> null - null *is distinct from* "x" -> true -

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Ryan Blue
It would be nice to avoid the problem by changing the semantics of Iceberg’s notNull, but I don’t think that’s a good idea for 2 main reasons. First, I think that API users creating expressions directly expect the current behavior. It would be surprising to a user if a notEqual expression didn’t

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
I think that we should follow the SQL semantics to prevent surprises when SQL engines integrate with Iceberg. .. Owen On Thu, Sep 17, 2020 at 9:08 PM Shardul Mahadik wrote: > Hi all, > > I noticed that Iceberg's predicates are not compatible with SQL predicates > when it comes to handling NULL