Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-15 Thread Zoltan Ivanfi
Dear Parquet and Impala Developers, We have exposed min/max statistics to extensive compatibility testing and found troubling inconsistencies regarding float and double values. Under certain (fortunately rather extreme) circumstances, this can lead to predicate pushdown incorrectly discarding row

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-15 Thread Laszlo Gaal
To supply some context: Impala has had a number of issues around NaN/infinity: The closest precedent related to the current issue seems to be IMPALA-6295 :

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-15 Thread Tim Armstrong
We could also consider treating NaN similar to NULL and having a separate piece of information with a count of NaN values (or just a bit indicating presence/absence of NaN). I'm not sure if that is easier or harder to implement than a total order. On Thu, Feb 15, 2018 at 9:12 AM, Laszlo Gaal wrot