The stats problem is on the write side. Parquet compares byte buffers (used
for UTF8 strings also) using byte-wise comparison, but got it wrong and
compares the Java byte values, which are signed. UTF8 ordering is the same
as byte-wise comparison, but only if the bytes are compared as unsigned
valu
Sounds great. Regarding the min/max stats issue, is that an issue with the way
the files are written or read? What's the Parquet project issue for that bug?
What's the 1.9.1 release timeline look like?
I will aim to have a PR in by the end of the week. I feel strongly that either
this or https:
I can when I'm finished with a couple other issues if no one gets to it
first.
Michael, if you're interested in updating to 1.9.0 I'm happy to help review
that PR.
On Tue, Nov 1, 2016 at 1:03 PM, Reynold Xin wrote:
> Ryan want to submit a pull request?
>
>
> On Tue, Nov 1, 2016 at 9:05 AM, Ryan
Ryan want to submit a pull request?
On Tue, Nov 1, 2016 at 9:05 AM, Ryan Blue wrote:
> 1.9.0 includes some fixes intended specifically for Spark:
>
> * PARQUET-389: Evaluates push-down predicates for missing columns as
> though they are null. This is to address Spark's work-around that requires
1.9.0 includes some fixes intended specifically for Spark:
* PARQUET-389: Evaluates push-down predicates for missing columns as though
they are null. This is to address Spark's work-around that requires reading
and merging file schemas, even for metastore tables.
* PARQUET-654: Adds an option to d
Yes this came up from a different direction:
https://issues.apache.org/jira/browse/SPARK-18140
I think it's fine to pursue an upgrade to fix these several issues. The
question is just how well it will play with other components, so bears some
testing and evaluation of the changes from 1.8, but yes
Hi All,
Is anyone working on updating Spark's Parquet library dep to 1.9? If not, I can
at least get started on it and publish a PR.
Cheers,
Michael
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org