[jira] [Commented] (PARQUET-2374) Add metrics support for parquet file reader

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789009#comment-17789009 ] ASF GitHub Bot commented on PARQUET-2374: - wgtmac commented on PR #1187: URL: h

Re: [PR] PARQUET-2374: Add metrics support for parquet file reader [parquet-mr]

2023-11-22 Thread via GitHub
wgtmac commented on PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#issuecomment-1823918745 Thanks @parthchandra! Do you have any TODO work item on this (or after vectored I/O is merged)? -- This is an automated message from the Apache Git Service. To respond to the message,

[jira] [Commented] (PARQUET-2261) [Format] Add statistics that reflect decoded size to metadata

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789003#comment-17789003 ] ASF GitHub Bot commented on PARQUET-2261: - wgtmac commented on PR #1177: URL: h

Re: [PR] PARQUET-2261: Implement SizeStatistics [parquet-mr]

2023-11-22 Thread via GitHub
wgtmac commented on PR #1177: URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1823898172 cc @ConeyLiu as I have modified mergeColumnStatistics method which you've just refactored. -- This is an automated message from the Apache Git Service. To respond to the message, plea

[jira] [Commented] (PARQUET-2261) [Format] Add statistics that reflect decoded size to metadata

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789001#comment-17789001 ] ASF GitHub Bot commented on PARQUET-2261: - wgtmac commented on PR #1177: URL: h

Re: [PR] PARQUET-2261: Implement SizeStatistics [parquet-mr]

2023-11-22 Thread via GitHub
wgtmac commented on PR #1177: URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1823895468 I have just rebased on the latest master branch and fixed all CI falures. As this PR gets too large, I will add print cli command and rewriter support for `SizeStatistics` in follow-up

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788999#comment-17788999 ] ASF GitHub Bot commented on PARQUET-2171: - wgtmac commented on code in PR #1139

Re: [PR] PARQUET-2171: Support Hadoop vectored IO [parquet-mr]

2023-11-22 Thread via GitHub
wgtmac commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1402925218 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -125,6 +130,8 @@ public class ParquetFileReader implements Closeable { pu

[jira] [Commented] (PARQUET-2374) Add metrics support for parquet file reader

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788940#comment-17788940 ] ASF GitHub Bot commented on PARQUET-2374: - parthchandra commented on PR #1187:

Re: [PR] PARQUET-2374: Add metrics support for parquet file reader [parquet-mr]

2023-11-22 Thread via GitHub
parthchandra commented on PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#issuecomment-1823714212 > > For the object stores, things to measure are > > > > * time to open() and close() a file > > * time for a read after a backwards seek > > * time for a read after a

[jira] [Commented] (PARQUET-2374) Add metrics support for parquet file reader

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788938#comment-17788938 ] ASF GitHub Bot commented on PARQUET-2374: - parthchandra commented on code in PR

Re: [PR] PARQUET-2374: Add metrics support for parquet file reader [parquet-mr]

2023-11-22 Thread via GitHub
parthchandra commented on code in PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#discussion_r1402836364 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java: ## @@ -80,10 +80,12 @@ static final class ColumnChunkPageReader impleme

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788926#comment-17788926 ] ASF GitHub Bot commented on PARQUET-2249: - etseidl commented on code in PR #221

Re: [PR] PARQUET-2249: Introduce IEEE 754 total order [parquet-format]

2023-11-22 Thread via GitHub
etseidl commented on code in PR #221: URL: https://github.com/apache/parquet-format/pull/221#discussion_r1402657927 ## src/main/thrift/parquet.thrift: ## @@ -288,7 +288,7 @@ struct MapType {} // see LogicalTypes.md struct ListType {}// see LogicalTypes.md struct EnumTy

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788868#comment-17788868 ] ASF GitHub Bot commented on PARQUET-2249: - JFinis commented on PR #196: URL: ht

Re: [PR] PARQUET-2249: Add nan_count to handle NaNs in statistics [parquet-format]

2023-11-22 Thread via GitHub
JFinis commented on PR #196: URL: https://github.com/apache/parquet-format/pull/196#issuecomment-1823348321 Okay, finally done. As the new solution (total order) does not share a single line with the current solution and this PR gets quite long and contrived, I created a new PR: https://git

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788866#comment-17788866 ] ASF GitHub Bot commented on PARQUET-2249: - JFinis opened a new pull request, #2

[PR] PARQUET-2249: Introduce IEEE 754 total order [parquet-format]

2023-11-22 Thread via GitHub
JFinis opened a new pull request, #221: URL: https://github.com/apache/parquet-format/pull/221 This commit adds a new column order `IEEE754TotalOrder`, which can be used for floating point types (FLOAT, DOUBLE, FLOAT16). The advantage of the new order is a well-defined ordering betwee

[jira] [Commented] (PARQUET-2261) [Format] Add statistics that reflect decoded size to metadata

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788808#comment-17788808 ] ASF GitHub Bot commented on PARQUET-2261: - wgtmac closed pull request #1201: [N

Re: [PR] [NOT FOR CHECKIN][DEBUG] PARQUET-2261: Implement SizeStatistics [parquet-mr]

2023-11-22 Thread via GitHub
wgtmac closed pull request #1201: [NOT FOR CHECKIN][DEBUG] PARQUET-2261: Implement SizeStatistics URL: https://github.com/apache/parquet-mr/pull/1201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Commented] (PARQUET-2261) [Format] Add statistics that reflect decoded size to metadata

2023-11-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788664#comment-17788664 ] ASF GitHub Bot commented on PARQUET-2261: - wgtmac opened a new pull request, #1

[PR] PARQUET-2261: Implement SizeStatistics [parquet-mr]

2023-11-22 Thread via GitHub
wgtmac opened a new pull request, #1201: URL: https://github.com/apache/parquet-mr/pull/1201 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in