Re: Moving forward with the timestamp proposal

2019-02-21 Thread Zoltan Ivanfi
> first? Or are we going to use low-level physical types directly and add > Spark-specific metadata to Parquet/Orc files? > > On Wed, Feb 20, 2019 at 10:57 PM Zoltan Ivanfi > wrote: > > > Hi, > > > > Last december we shared a timestamp harmonization pro

Adding more timestamp types to on-disk storage formats

2019-01-17 Thread Zoltan Ivanfi
Hi, One of the feedbacks I got for the SQL timestamp type harmonization proposal was that I should reach out the file format communities as well. For this purpose I created a separate document from their perspective and sent it to the Avro, ORC, Parquet, Arrow, Kudu and Iceberg developer lists.

Updated proposal: Consistent timestamp types in Hadoop SQL engines

2018-12-19 Thread Zoltan Ivanfi
Dear All, I would like to thank every reviewer of the consistent timestamps proposal[1] for their time and valuable comments. Based on your feedback, I have updated the proposal. The changes include clarifications, fixes and other improvements as summarized at the end of the document, in the

Re: cloudera.com From headers being re-written on this list

2018-07-12 Thread Zoltan Ivanfi
Hi, I have seen this happening for other e-mail addresses and on other mailing lists as well. I may be wrong, but I would suppose it is a deliberate anti-spam measure. Zoltan On Tue, Jun 26, 2018 at 6:23 PM Michael Brown wrote: > Hi, > > For some reason mail to this list from users

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-20 Thread Zoltan Ivanfi
that the > whole row group can be included. The addition of NaNs doesn't change > that. > > OTOH, if b <= a <= c, then we have to check the whole row group, and > the addition of NaNs doesn't change that. > > On Tue, Feb 20, 2018 at 9:14 AM, Alexander Behm <alex.b...@cloude

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-16 Thread Zoltan Ivanfi
handle similar > > issues around NaNs/infinity (or infinities, in the case of IEEE-754). > > > > Thanks, > > > > - LaszloG > > > > > > On Thu, Feb 15, 2018 at 5:10 PM, Zoltan Ivanfi <z...@cloudera.com> wrote: > > > > > Dea

Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-15 Thread Zoltan Ivanfi
Dear Parquet and Impala Developers, We have exposed min/max statistics to extensive compatibility testing and found troubling inconsistencies regarding float and double values. Under certain (fortunately rather extreme) circumstances, this can lead to predicate pushdown incorrectly discarding row