Re: definition written before repetition?

2020-05-22 Thread Wes McKinney
Sorry, I'm wrong -- C++ is doing it correctly, I was looking at the wrong code. False alarm! https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L685 I was shocked that such a blatant correctness issue might have existed but since people have been able to read nested data

Re: definition written before repetition?

2020-05-22 Thread Wes McKinney
If that's the case (and according to the Format documentation it is) then we are doing it incorrectly in C++. How depressing https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L1097 This is unfortunately what happens when you don't have more rigorous integration tests.

[jira] [Created] (PARQUET-1867) [C++] Fix MetaData children object lifetime issues

2020-05-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created PARQUET-1867: --- Summary: [C++] Fix MetaData children object lifetime issues Key: PARQUET-1867 URL: https://issues.apache.org/jira/browse/PARQUET-1867 Project: Pa

[GitHub] [parquet-mr] gszadovszky commented on pull request #776: PARQUET-1229: Parquet MR encryption

2020-05-22 Thread GitBox
gszadovszky commented on pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-632617498 > > I know, this is not the whole feature yet but would like to ensure that at the end we will have proper test coverity. So, the question is do we have or planning to have

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113915#comment-17113915 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on pull reques

[jira] [Commented] (PARQUET-1866) Replace Hadoop ZSTD with JNI-ZSTD

2020-05-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113905#comment-17113905 ] ASF GitHub Bot commented on PARQUET-1866: - gszadovszky commented on a change in

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #793: PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD

2020-05-22 Thread GitBox
gszadovszky commented on a change in pull request #793: URL: https://github.com/apache/parquet-mr/pull/793#discussion_r429112028 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/ZstdCodec.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Softw

Re: definition written before repetition?

2020-05-22 Thread Gabor Szadovszky
Hi ZJ, parquet-mr clearly writes repetition levels and definition levels according to the specification. See the following code references. For V1 pages: https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterV1.java#L60 For V2 page