Issue with writing null values to complex type.

2019-04-29 Thread shyam narayan singh
Hi I have encountered a regression for writing nulls to the complex type. I have moved from parquet 1.8.x to 1.12 recently. Here is what I found out. My dataset has 111k null values to be written to a complex type. Earlier with 1.8.x, it would create single page but with 1.12 it creates 20 pages

[jira] [Commented] (PARQUET-1405) [C++] 'Couldn't deserialize thrift' error when reading large binary column

2019-04-29 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829769#comment-16829769 ] Deepak Majeti commented on PARQUET-1405: Filed https://issues.apache.org/jira/b

[jira] [Commented] (PARQUET-1405) [C++] 'Couldn't deserialize thrift' error when reading large binary column

2019-04-29 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829735#comment-16829735 ] Wes McKinney commented on PARQUET-1405: --- We can add an option to not write statis

[jira] [Comment Edited] (PARQUET-1405) [C++] 'Couldn't deserialize thrift' error when reading large binary column

2019-04-29 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829711#comment-16829711 ] Deepak Majeti edited comment on PARQUET-1405 at 4/29/19 8:59 PM:

[jira] [Commented] (PARQUET-1405) [C++] 'Couldn't deserialize thrift' error when reading large binary column

2019-04-29 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829711#comment-16829711 ] Deepak Majeti commented on PARQUET-1405: PARQUET-979 omits large statistics ins

[jira] [Assigned] (PARQUET-1405) [C++] 'Couldn't deserialize thrift' error when reading large binary column

2019-04-29 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Majeti reassigned PARQUET-1405: -- Assignee: Deepak Majeti > [C++] 'Couldn't deserialize thrift' error when reading lar

Key signing (was: [VOTE] Release Apache Parquet 1.11.0 RC6)

2019-04-29 Thread Zoltan Ivanfi
Hi, A video call sounds more secure to me than a photo which can be easily manipulated. We could spend 5 minutes on it in the next Parquet sync or alternatively is there someone already in the web of trust who would volunteer to do a private video call with us before or after the sync? Thanks, Z

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-04-29 Thread Wes McKinney
On Mon, Apr 29, 2019 at 12:48 PM Zoltan Ivanfi wrote: > > Hi, > > An excerpt from > https://www.apache.org/dev/release-signing#verifying-signature : "A > signature is valid, if gpg verifies the .asc as a good signature, and > doesn't complain about expired or revoked keys." Another excerpt from >

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-04-29 Thread Zoltan Ivanfi
Hi, An excerpt from https://www.apache.org/dev/release-signing#verifying-signature : "A signature is valid, if gpg verifies the .asc as a good signature, and doesn't complain about expired or revoked keys." Another excerpt from https://www.apache.org/dev/release-signing#check-integrity that reinfo

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Ivan Sadikov
Yeah, you are right. Looks like the right JIRA ticket. On Mon, 29 Apr 2019 at 5:39 PM, Curt Hagenlocher wrote: > Would that be covered by PARQUET-458 ( > https://issues.apache.org/jira/browse/PARQUET-458)? > > On Mon, Apr 29, 2019 at 8:18 AM Wes McKinney wrote: > > > Is there a JIRA issue about

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Ivan Sadikov
Not in V2, in V1 the whole page is encoded, but in V2 it is only values, if I remember correctly. So we would have to extract repetition and definition levels bytes and then decode values. You can check out code in parquet rust module! I am not sure about parquet-cpp, we can use that implementati

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-04-29 Thread Wes McKinney
hi Zoltan, I'm looking for ASF guidelines around this, whether it is MUST or SHOULD https://www.apache.org/dev/release-signing#web-of-trust Because SVN access is only password protected, having access to the KEYS file is a weak standard of security. Could other PMC members comment on this? Than

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-04-29 Thread Zoltan Ivanfi
Hi Wes, Gabor's key is in the KEYS file available at https://dist.apache.org/repos/dist/dev/parquet/KEYS Others may correct me if I'm mistaken, but as far as I know, this is all that is required. I mentioned this in the verification steps as well ("4. Verify the signature by running `gpg --verify

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-04-29 Thread Wes McKinney
-1 Gabor's PGP key is unsigned. $ gpg --verify apache-parquet-1.11.0.tar.gz.asc gpg: assuming signed data in 'apache-parquet-1.11.0.tar.gz' gpg: Signature made Tue 19 Mar 2019 08:55:48 AM CDT gpg:using RSA key 6FB82970311551C7CEF131F5021057DBF048F543 gpg: Good signature from "Gabo

[jira] [Commented] (PARQUET-1405) [C++] 'Couldn't deserialize thrift' error when reading large binary column

2019-04-29 Thread John Adcock (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829371#comment-16829371 ] John Adcock commented on PARQUET-1405: -- I'm being hit by this issue, I'm happy to

Friendly reminder

2019-04-29 Thread Adam Alami
Dear Apache communities, Big thanks to those who participated in the survey (great participation from the Apache communities). If you haven’t participated, please participate? What value is there in participating? I will be sharing the results with the community in the form of a report (slides

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Curt Hagenlocher
Would that be covered by PARQUET-458 ( https://issues.apache.org/jira/browse/PARQUET-458)? On Mon, Apr 29, 2019 at 8:18 AM Wes McKinney wrote: > Is there a JIRA issue about data page v2 issues in parquet-cpp? > > On Mon, Apr 29, 2019 at 9:57 AM Curt Hagenlocher > wrote: > > > > But the data pag

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Wes McKinney
Is there a JIRA issue about data page v2 issues in parquet-cpp? On Mon, Apr 29, 2019 at 9:57 AM Curt Hagenlocher wrote: > > But the data page is decoded only after it is decompressed, so I wouldn’t > expect an unsupported data page to cause a decompression failure. > > (I am playing with adding

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Curt Hagenlocher
But the data page is decoded only after it is decompressed, so I wouldn’t expect an unsupported data page to cause a decompression failure. (I am playing with adding V2 support to Parquet.Net.) Sent from my iPhone > On Apr 29, 2019, at 7:30 AM, Ivan Sadikov wrote: > > If you are referring to

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Ivan Sadikov
If you are referring to the file in Apache/parquet-testing repository, it is a valid Parquet file with data encoded into data page v2. You can easily test it with “cargo install parquet” and “parquet-read filepath”. I am not sure what kind of code you have written, but the error you have encounte

Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Curt Hagenlocher
To the best of my ability to tell, there is invalid Snappy data in the file parquet-testing/data/datapage_v2.snappy.parquet. I can neither read it with my own code nor with pyarrow 0.13.0. Is this expected to work? Thanks! -Curt