Re: Updating Parquet dep to 1.9

2016-11-02 Thread Ryan Blue
The stats problem is on the write side. Parquet compares byte buffers (used for UTF8 strings also) using byte-wise comparison, but got it wrong and compares the Java byte values, which are signed. UTF8 ordering is the same as byte-wise comparison, but only if the bytes are compared as unsigned

Re: Updating Parquet dep to 1.9

2016-11-02 Thread Michael Allman
Sounds great. Regarding the min/max stats issue, is that an issue with the way the files are written or read? What's the Parquet project issue for that bug? What's the 1.9.1 release timeline look like? I will aim to have a PR in by the end of the week. I feel strongly that either this or

Re: Updating Parquet dep to 1.9

2016-11-01 Thread Ryan Blue
I can when I'm finished with a couple other issues if no one gets to it first. Michael, if you're interested in updating to 1.9.0 I'm happy to help review that PR. On Tue, Nov 1, 2016 at 1:03 PM, Reynold Xin wrote: > Ryan want to submit a pull request? > > > On Tue, Nov 1,

Re: Updating Parquet dep to 1.9

2016-11-01 Thread Reynold Xin
Ryan want to submit a pull request? On Tue, Nov 1, 2016 at 9:05 AM, Ryan Blue wrote: > 1.9.0 includes some fixes intended specifically for Spark: > > * PARQUET-389: Evaluates push-down predicates for missing columns as > though they are null. This is to address

Re: Updating Parquet dep to 1.9

2016-11-01 Thread Sean Owen
Yes this came up from a different direction: https://issues.apache.org/jira/browse/SPARK-18140 I think it's fine to pursue an upgrade to fix these several issues. The question is just how well it will play with other components, so bears some testing and evaluation of the changes from 1.8, but

Updating Parquet dep to 1.9

2016-10-31 Thread Michael Allman
Hi All, Is anyone working on updating Spark's Parquet library dep to 1.9? If not, I can at least get started on it and publish a PR. Cheers, Michael - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org