[GitHub] [parquet-mr] gszadovszky commented on pull request #868: PARQUET-1977: Invalid data_page_offset

2021-02-16 Thread GitBox
gszadovszky commented on pull request #868: URL: https://github.com/apache/parquet-mr/pull/868#issuecomment-779685215 > Great findings! I will review it sometime this week. > > BTW, I see you keep finding issues like this. Are you doing some testing? The Impala team started wor

[jira] [Commented] (PARQUET-1977) Invalid data_page_offset

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285099#comment-17285099 ] ASF GitHub Bot commented on PARQUET-1977: - gszadovszky commented on pull reques

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

2021-02-16 Thread GitBox
gszadovszky commented on a change in pull request #869: URL: https://github.com/apache/parquet-mr/pull/869#discussion_r576649370 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java ## @@ -157,14 +157,14 @@ public static Column

[jira] [Commented] (PARQUET-1979) Optional bloom_filter_offset is filled if no bloom filter is present

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285100#comment-17285100 ] ASF GitHub Bot commented on PARQUET-1979: - gszadovszky commented on a change in

[jira] [Created] (PARQUET-1980) Build and test Apache Parquet Format on ARM64 CPU architecture

2021-02-16 Thread Martin Tzvetanov Grigorov (Jira)
Martin Tzvetanov Grigorov created PARQUET-1980: -- Summary: Build and test Apache Parquet Format on ARM64 CPU architecture Key: PARQUET-1980 URL: https://issues.apache.org/jira/browse/PARQUET-1980

[GitHub] [parquet-format] martin-g opened a new pull request #167: PARQUET-1980: Add TravisCI job to build and test on ARM64

2021-02-16 Thread GitBox
martin-g opened a new pull request #167: URL: https://github.com/apache/parquet-format/pull/167 ### Jira https://issues.apache.org/jira/browse/PARQUET-1980 ### Commits - [ X ] My commits all reference Jira issues in their subject lines. In addition, my commits follow th

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet Format on ARM64 CPU architecture

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285140#comment-17285140 ] ASF GitHub Bot commented on PARQUET-1980: - martin-g opened a new pull request #

[GitHub] [parquet-format] martin-g commented on a change in pull request #167: PARQUET-1980: Add TravisCI job to build and test on ARM64

2021-02-16 Thread GitBox
martin-g commented on a change in pull request #167: URL: https://github.com/apache/parquet-format/pull/167#discussion_r576740506 ## File path: .travis.yml ## @@ -31,3 +37,5 @@ before_install: - ./configure --disable-gen-erl --disable-gen-hs --without-ruby --without-haskell

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet Format on ARM64 CPU architecture

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285141#comment-17285141 ] ASF GitHub Bot commented on PARQUET-1980: - martin-g commented on a change in pu

[GitHub] [parquet-mr] chenjunjiedada commented on pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

2021-02-16 Thread GitBox
chenjunjiedada commented on pull request #869: URL: https://github.com/apache/parquet-mr/pull/869#issuecomment-779779278 @gszadovszky, Thanks for fixing this! It looks correct to me. Just one minor thing, could you help to add a unit test to check null bloom filter offset when there

[jira] [Commented] (PARQUET-1979) Optional bloom_filter_offset is filled if no bloom filter is present

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285148#comment-17285148 ] ASF GitHub Bot commented on PARQUET-1979: - chenjunjiedada commented on pull req

[GitHub] [parquet-format] pitrou commented on a change in pull request #164: PARQUET-1950: Define core features

2021-02-16 Thread GitBox
pitrou commented on a change in pull request #164: URL: https://github.com/apache/parquet-format/pull/164#discussion_r576804103 ## File path: CoreFeatures.md ## @@ -0,0 +1,188 @@ + + +# Parquet Core Features + +This document lists the core features for each parquet-format relea

[jira] [Commented] (PARQUET-1950) Define core features / compliance level

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285181#comment-17285181 ] ASF GitHub Bot commented on PARQUET-1950: - pitrou commented on a change in pull

[GitHub] [parquet-format] pitrou commented on a change in pull request #164: PARQUET-1950: Define core features

2021-02-16 Thread GitBox
pitrou commented on a change in pull request #164: URL: https://github.com/apache/parquet-format/pull/164#discussion_r576805491 ## File path: CoreFeatures.md ## @@ -0,0 +1,181 @@ + + +# Parquet Core Features + +This document lists the core features for each parquet-format relea

[jira] [Commented] (PARQUET-1950) Define core features / compliance level

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285183#comment-17285183 ] ASF GitHub Bot commented on PARQUET-1950: - pitrou commented on a change in pull

[jira] [Created] (PARQUET-1981) Consider adding BloomFilterHeader to ColumnMetaData

2021-02-16 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created PARQUET-1981: Summary: Consider adding BloomFilterHeader to ColumnMetaData Key: PARQUET-1981 URL: https://issues.apache.org/jira/browse/PARQUET-1981 Project: Parquet

Request deprecation / removal of LZ4 compression

2021-02-16 Thread Antoine Pitrou
Hello, This is a proposal to deprecate and/or remove LZ4 compression from the Parquet specification. Abstract Despite several attempts by the parquet-cpp developers, we were not able to reach the point where LZ4-compressed Parquet files are bidirectionally compatible between parquet-c

[GitHub] [parquet-format] gszadovszky commented on pull request #167: PARQUET-1980: Add TravisCI job to build and test on ARM64

2021-02-16 Thread GitBox
gszadovszky commented on pull request #167: URL: https://github.com/apache/parquet-format/pull/167#issuecomment-779887663 @martin-g, because of the recent Travis issue we have decided to move to Github Actions. We've already done with the transition in parquet-mr and it is on the way for p

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet Format on ARM64 CPU architecture

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285246#comment-17285246 ] ASF GitHub Bot commented on PARQUET-1980: - gszadovszky commented on pull reques

[GitHub] [parquet-format] gszadovszky edited a comment on pull request #167: PARQUET-1980: Add TravisCI job to build and test on ARM64

2021-02-16 Thread GitBox
gszadovszky edited a comment on pull request #167: URL: https://github.com/apache/parquet-format/pull/167#issuecomment-779887663 @martin-g, because of the recent Travis issue we have decided to move to Github Actions. We've already done with the transition in parquet-mr and it is on the wa

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet Format on ARM64 CPU architecture

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285251#comment-17285251 ] ASF GitHub Bot commented on PARQUET-1980: - gszadovszky edited a comment on pull

[jira] [Created] (PARQUET-1982) Allow random access to row groups in ParquetFileReader

2021-02-16 Thread Felix Schmalzel (Jira)
Felix Schmalzel created PARQUET-1982: Summary: Allow random access to row groups in ParquetFileReader Key: PARQUET-1982 URL: https://issues.apache.org/jira/browse/PARQUET-1982 Project: Parquet

[jira] [Created] (PARQUET-1983) Pool SeekableInputStreams in ParquetFileReader

2021-02-16 Thread Felix Schmalzel (Jira)
Felix Schmalzel created PARQUET-1983: Summary: Pool SeekableInputStreams in ParquetFileReader Key: PARQUET-1983 URL: https://issues.apache.org/jira/browse/PARQUET-1983 Project: Parquet Is

[jira] [Created] (PARQUET-1984) Some tests fail on windows

2021-02-16 Thread Felix Schmalzel (Jira)
Felix Schmalzel created PARQUET-1984: Summary: Some tests fail on windows Key: PARQUET-1984 URL: https://issues.apache.org/jira/browse/PARQUET-1984 Project: Parquet Issue Type: Bug

[GitHub] [parquet-mr] gszadovszky commented on pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

2021-02-16 Thread GitBox
gszadovszky commented on pull request #869: URL: https://github.com/apache/parquet-mr/pull/869#issuecomment-779922579 > @gszadovszky, Thanks for fixing this! > > It looks correct to me. Just one minor thing, could you help to add a unit test to check null bloom filter offset when the

[jira] [Commented] (PARQUET-1979) Optional bloom_filter_offset is filled if no bloom filter is present

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285290#comment-17285290 ] ASF GitHub Bot commented on PARQUET-1979: - gszadovszky commented on pull reques

[GitHub] [parquet-mr] gszadovszky commented on pull request #867: PARQUET-1978: Provide a tool to show the complete footer

2021-02-16 Thread GitBox
gszadovszky commented on pull request #867: URL: https://github.com/apache/parquet-mr/pull/867#issuecomment-779924667 @shangxinli, however this one is certainly not a blocker for the release it might make debugging easier. If you have some time, please check. If you don't, it's fine and I'

[jira] [Commented] (PARQUET-1978) Provide a tool to show the complete footer

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285292#comment-17285292 ] ASF GitHub Bot commented on PARQUET-1978: - gszadovszky commented on pull reques

Re: Request deprecation / removal of LZ4 compression

2021-02-16 Thread Gabor Szadovszky
Thank you for the detailed summary of the LZ4 situation, Antoine! The Parquet file format should be properly specified for every implementation. It was the mistake of the parquet-mr developers that we thought the Hadoop implementation of LZ4 is according to the LZ4 specification and the fault of t

Re: Request deprecation / removal of LZ4 compression

2021-02-16 Thread Jacques Nadeau
There is some ambiguity in the discussion and proposals here around deprecating future writing versus supporting reading of already written data and what it means to deprecate something in the format specification. I think it would be a mistake for someone who has written Hadoop-Lz4 for several ye

Re: Request deprecation / removal of LZ4 compression

2021-02-16 Thread Micah Kornfield
> > I think it would be a mistake for someone who has written Hadoop-Lz4 for > several years with parquet-mr to all of sudden be no longer able to read > their files. (I believe that parquet-mr with this pattern has been > incorporated into various libraries for several years now--correct me if > I

[jira] [Commented] (PARQUET-1969) Test by GithubAction

2021-02-16 Thread Martin Tzvetanov Grigorov (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285673#comment-17285673 ] Martin Tzvetanov Grigorov commented on PARQUET-1969: Hi! What exac

[GitHub] [parquet-format] martin-g commented on pull request #167: PARQUET-1980: Add TravisCI job to build and test on ARM64

2021-02-16 Thread GitBox
martin-g commented on pull request #167: URL: https://github.com/apache/parquet-format/pull/167#issuecomment-780363087 Thanks for the reference, @gszadovszky ! I've added a comment to [PARQUET-1969](https://issues.apache.org/jira/browse/PARQUET-1969?focusedCommentId=17285673&page=com.atl

[GitHub] [parquet-format] martin-g closed pull request #167: PARQUET-1980: Add TravisCI job to build and test on ARM64

2021-02-16 Thread GitBox
martin-g closed pull request #167: URL: https://github.com/apache/parquet-format/pull/167 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [parquet-format] martin-g commented on pull request #167: PARQUET-1980: Add TravisCI job to build and test on ARM64

2021-02-16 Thread GitBox
martin-g commented on pull request #167: URL: https://github.com/apache/parquet-format/pull/167#issuecomment-780363348 Closing this PR since the code is now in Parquet MR! This is an automated message from the Apache Git Serv

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet Format on ARM64 CPU architecture

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285676#comment-17285676 ] ASF GitHub Bot commented on PARQUET-1980: - martin-g closed pull request #167: U

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet Format on ARM64 CPU architecture

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285675#comment-17285675 ] ASF GitHub Bot commented on PARQUET-1980: - martin-g commented on pull request #

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet Format on ARM64 CPU architecture

2021-02-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285678#comment-17285678 ] ASF GitHub Bot commented on PARQUET-1980: - martin-g commented on pull request #

[jira] [Comment Edited] (PARQUET-1969) Test by GithubAction

2021-02-16 Thread Martin Tzvetanov Grigorov (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285673#comment-17285673 ] Martin Tzvetanov Grigorov edited comment on PARQUET-1969 at 2/17/21, 7:37 AM: ---