[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17657092#comment-17657092 ] ASF GitHub Bot commented on PARQUET-2219: - wgtmac commented on code in PR #1018: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1018: PARQUET-2219: ParquetFileReader skips empty row group

2023-01-10 Thread GitBox
wgtmac commented on code in PR #1018: URL: https://github.com/apache/parquet-mr/pull/1018#discussion_r1066592694 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -927,7 +925,15 @@ public PageReadStore readRowGroup(int blockIndex) throws

[jira] [Commented] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17657057#comment-17657057 ] ASF GitHub Bot commented on PARQUET-2160: - camper42 commented on PR #982: URL:

[GitHub] [parquet-mr] camper42 commented on pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2023-01-10 Thread GitBox
camper42 commented on PR #982: URL: https://github.com/apache/parquet-mr/pull/982#issuecomment-1378181577 same problem with @alexeykudinkin currently we replace paruqet jar with patched one in our image, waiting for release -- This is an automated message from the Apache Git

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656966#comment-17656966 ] ASF GitHub Bot commented on PARQUET-2075: - shangxinli commented on PR #1014: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-10 Thread GitBox
shangxinli commented on PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1377840323 Thanks a lot @gszadovszky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet on ARM64 CPU architecture

2023-01-10 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656754#comment-17656754 ] Gabor Szadovszky commented on PARQUET-1980: --- Perfect. Thank you, [~mgrigorov]! > Build and

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656749#comment-17656749 ] ASF GitHub Bot commented on PARQUET-2075: - gszadovszky commented on PR #1014: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-10 Thread GitBox
gszadovszky commented on PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1377706622 > @gszadovszky I Just want to check if you have time to have a look. @wgtmac just be nice to take over the work that we discussed earlier to have an aggregated rewriter.

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656747#comment-17656747 ] ASF GitHub Bot commented on PARQUET-2219: - gszadovszky commented on PR #1018: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1018: PARQUET-2219: ParquetFileReader skips empty row group

2023-01-10 Thread GitBox
gszadovszky commented on PR #1018: URL: https://github.com/apache/parquet-mr/pull/1018#issuecomment-1377700950 > @gszadovszky Nice to see you are back! @shangxinli, I wouldn't say I'm back, unfortunately. I'm a bit closer to Parquet at Dremio but actually not working on it. We'll see

[jira] [Updated] (PARQUET-2225) [C++] Allow reading dense with RecordReader

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-2225: Labels: pull-request-available (was: ) > [C++] Allow reading dense with RecordReader >

[jira] [Updated] (PARQUET-2225) [C++] Allow reading dense with RecordReader

2023-01-10 Thread fatemah (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fatemah updated PARQUET-2225: - Summary: [C++] Allow reading dense with RecordReader (was: Allow reading dense with RecordReader) >

[jira] [Created] (PARQUET-2225) Allow reading dense with RecordReader

2023-01-10 Thread fatemah (Jira)
fatemah created PARQUET-2225: Summary: Allow reading dense with RecordReader Key: PARQUET-2225 URL: https://issues.apache.org/jira/browse/PARQUET-2225 Project: Parquet Issue Type: New Feature

[jira] [Commented] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656721#comment-17656721 ] ASF GitHub Bot commented on PARQUET-2160: - shangxinli commented on PR #982: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2023-01-10 Thread GitBox
shangxinli commented on PR #982: URL: https://github.com/apache/parquet-mr/pull/982#issuecomment-1377598327 Thanks @alexeykudinkin for the explanation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[jira] [Commented] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656720#comment-17656720 ] ASF GitHub Bot commented on PARQUET-2160: - alexeykudinkin commented on PR #982: URL:

[GitHub] [parquet-mr] alexeykudinkin commented on pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2023-01-10 Thread GitBox
alexeykudinkin commented on PR #982: URL: https://github.com/apache/parquet-mr/pull/982#issuecomment-1377589036 Totally @shangxinli We have running Spark clusters in production _ingesting_ from 100s of Apache Hudi tables (using Parquet and Zstd) and writing into other ones. We

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656714#comment-17656714 ] ASF GitHub Bot commented on PARQUET-2219: - shangxinli commented on code in PR #1018: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1018: PARQUET-2219: ParquetFileReader skips empty row group

2023-01-10 Thread GitBox
shangxinli commented on code in PR #1018: URL: https://github.com/apache/parquet-mr/pull/1018#discussion_r1066042941 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1038,7 +1044,10 @@ public PageReadStore readNextFilteredRowGroup()

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656711#comment-17656711 ] ASF GitHub Bot commented on PARQUET-2219: - shangxinli commented on code in PR #1018: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1018: PARQUET-2219: ParquetFileReader skips empty row group

2023-01-10 Thread GitBox
shangxinli commented on code in PR #1018: URL: https://github.com/apache/parquet-mr/pull/1018#discussion_r1066038932 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -927,7 +925,15 @@ public PageReadStore readRowGroup(int blockIndex)

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656692#comment-17656692 ] ASF GitHub Bot commented on PARQUET-2075: - wgtmac commented on code in PR #1014: URL:

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656693#comment-17656693 ] ASF GitHub Bot commented on PARQUET-2075: - wgtmac commented on code in PR #1014: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-10 Thread GitBox
wgtmac commented on code in PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#discussion_r1065962705 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/RewriteOptions.java: ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-10 Thread GitBox
wgtmac commented on code in PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#discussion_r1065962705 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/RewriteOptions.java: ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656691#comment-17656691 ] ASF GitHub Bot commented on PARQUET-2219: - wgtmac commented on PR #1018: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1018: PARQUET-2219: ParquetFileReader skips empty row group

2023-01-10 Thread GitBox
wgtmac commented on PR #1018: URL: https://github.com/apache/parquet-mr/pull/1018#issuecomment-1377485663 > Thanks you for fixing this. I've added some comments. Also, could you add a similar test for the filtered row groups? Thanks for your review @gszadovszky ! I have

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet on ARM64 CPU architecture

2023-01-10 Thread Martin Tzvetanov Grigorov (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656629#comment-17656629 ] Martin Tzvetanov Grigorov commented on PARQUET-1980: Apache Arrow team is going to

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet on ARM64 CPU architecture

2023-01-10 Thread Martin Tzvetanov Grigorov (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656628#comment-17656628 ] Martin Tzvetanov Grigorov commented on PARQUET-1980: Unfortunately the build takes

[jira] [Commented] (PARQUET-2212) Add ByteBuffer api for decryptors to allow direct memory to be decrypted

2023-01-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656517#comment-17656517 ] ASF GitHub Bot commented on PARQUET-2212: - wgtmac commented on PR #1008: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

2023-01-10 Thread GitBox
wgtmac commented on PR #1008: URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1376928785 > @wgtmac Do you have time to have a look? @shangxinli Thanks for mentioning me. Sure, I will take a look this week. -- This is an automated message from the Apache Git