[jira] [Commented] (PARQUET-2103) crypto exception in print toPrettyJSON

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694355#comment-17694355 ] ASF GitHub Bot commented on PARQUET-2103: - wgtmac commented on PR #1019: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1019: PARQUET-2103: Fix crypto exception in print toPrettyJSON

2023-02-27 Thread via GitHub
wgtmac commented on PR #1019: URL: https://github.com/apache/parquet-mr/pull/1019#issuecomment-1447683473 > Thanks @shangxinli . @wgtmac , are you ok with the current state of the PR? LGTM. Feel free to merge. Thanks! -- This is an automated message from the Apache Git Service. To r

[jira] [Commented] (PARQUET-2103) crypto exception in print toPrettyJSON

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694354#comment-17694354 ] ASF GitHub Bot commented on PARQUET-2103: - ggershinsky commented on PR #1019: U

[GitHub] [parquet-mr] ggershinsky commented on pull request #1019: PARQUET-2103: Fix crypto exception in print toPrettyJSON

2023-02-27 Thread via GitHub
ggershinsky commented on PR #1019: URL: https://github.com/apache/parquet-mr/pull/1019#issuecomment-1447680686 Thanks @shangxinli . @wgtmac , are you ok with the current state of the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694331#comment-17694331 ] ASF GitHub Bot commented on PARQUET-2159: - gszadovszky commented on code in PR

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-27 Thread via GitHub
gszadovszky commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1119607355 ## README.md: ## @@ -83,6 +83,20 @@ Parquet is a very active project, and new features are being added quickly. Here * Column stats * Delta encoding * Index

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694329#comment-17694329 ] ASF GitHub Bot commented on PARQUET-2230: - wgtmac commented on PR #1036: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1036: PARQUET-2230: [CLI] Deprecate commands replaced by rewrite

2023-02-27 Thread via GitHub
wgtmac commented on PR #1036: URL: https://github.com/apache/parquet-mr/pull/1036#issuecomment-1447636121 Deprecate some commands. Please take a look, thanks! @gszadovszky @ggershinsky @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, p

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694309#comment-17694309 ] ASF GitHub Bot commented on PARQUET-2230: - wgtmac opened a new pull request, #1

[GitHub] [parquet-mr] wgtmac opened a new pull request, #1036: PARQUET-2230: [CLI] Deprecate commands replaced by rewrite

2023-02-27 Thread via GitHub
wgtmac opened a new pull request, #1036: URL: https://github.com/apache/parquet-mr/pull/1036 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694253#comment-17694253 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-27 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1119492456 ## pom.xml: ## @@ -659,5 +662,13 @@ + + + plugins Review Comment: @wgtmac @gszadovszky vector-plugins +1, and it can show

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694249#comment-17694249 ] ASF GitHub Bot commented on PARQUET-2159: - wgtmac commented on code in PR #1011

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-27 Thread via GitHub
wgtmac commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1119478321 ## pom.xml: ## @@ -659,5 +662,13 @@ + + + plugins Review Comment: `plugins` is a little bit generic to me. Rename it to `encoding

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694116#comment-17694116 ] ASF GitHub Bot commented on PARQUET-2149: - parthchandra commented on PR #968: U

[GitHub] [parquet-mr] parthchandra commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2023-02-27 Thread via GitHub
parthchandra commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1446796605 Also try a query like ``` select SUM(length(IFNULL(ss_sold_date_sk, ' '))), SUM(length(IFNULL(ss_sold_time_sk, ' '))), SUM(length(IFNULL(ss_i

[jira] [Commented] (PARQUET-2198) Vulnerabilities in jackson-databind

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694110#comment-17694110 ] ASF GitHub Bot commented on PARQUET-2198: - shangxinli commented on PR #1005: UR

[GitHub] [parquet-mr] shangxinli commented on pull request #1005: PARQUET-2198 : Updating jackson data bind version to fix CVEs

2023-02-27 Thread via GitHub
shangxinli commented on PR #1005: URL: https://github.com/apache/parquet-mr/pull/1005#issuecomment-1446766714 Yeah, we will release soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694065#comment-17694065 ] ASF GitHub Bot commented on PARQUET-2149: - whcdjj commented on PR #968: URL: ht

[GitHub] [parquet-mr] whcdjj commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2023-02-27 Thread via GitHub
whcdjj commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1446487715 > Looks correct to me. Couple of questions, are you running this on a cluster or on local system? Also, is the data on SSD's? If you are on a single machine, there might not be enough CPU

Gang Wu as new Apache Parquet committer

2023-02-27 Thread Xinli shang
The Project Management Committee (PMC) for Apache Parquet has invited Gang Wu (gangwu) to become a committer and we are pleased to announce that he has accepted. Congratulations and welcome, Gang! -- Xinli Shang

[jira] [Commented] (PARQUET-2198) Vulnerabilities in jackson-databind

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694038#comment-17694038 ] ASF GitHub Bot commented on PARQUET-2198: - satish-mittal commented on PR #1005:

[GitHub] [parquet-mr] satish-mittal commented on pull request #1005: PARQUET-2198 : Updating jackson data bind version to fix CVEs

2023-02-27 Thread via GitHub
satish-mittal commented on PR #1005: URL: https://github.com/apache/parquet-mr/pull/1005#issuecomment-1446359333 @shangxinli can we release a new version that fixes these two vulnerabilities? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693993#comment-17693993 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-27 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1118663395 ## parquet-generator/src/main/java/org/apache/parquet/encoding/vectorbitpacking/BitPackingGenerator512Vector.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the A

[jira] [Commented] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693920#comment-17693920 ] ASF GitHub Bot commented on PARQUET-2251: - yabola commented on PR #1033: URL: h

[GitHub] [parquet-mr] yabola commented on pull request #1033: PARQUET-2251 Avoid generating Bloomfilter when all pages of a column are encoded by dictionary in parquet v1

2023-02-27 Thread via GitHub
yabola commented on PR #1033: URL: https://github.com/apache/parquet-mr/pull/1033#issuecomment-1446022464 @wgtmac @gszadovszky Thank you for your review and help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693911#comment-17693911 ] ASF GitHub Bot commented on PARQUET-2230: - wgtmac commented on PR #1034: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

2023-02-27 Thread via GitHub
wgtmac commented on PR #1034: URL: https://github.com/apache/parquet-mr/pull/1034#issuecomment-1445977675 > @wgtmac, I think it would be nice to inform the cli users as well about the deprecation. There is no harm keeping these commands only that if the related implementation is deprecated

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693910#comment-17693910 ] ASF GitHub Bot commented on PARQUET-: - wgtmac commented on PR #193: URL: ht

[GitHub] [parquet-format] wgtmac commented on pull request #193: PARQUET-2222: RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread via GitHub
wgtmac commented on PR #193: URL: https://github.com/apache/parquet-format/pull/193#issuecomment-1445975762 As the RLE encoding also applies to boolean values and dictionary-encoded indices in the data page v2, this may not be correct. @pitrou -- This is an automated message from the Apa

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693909#comment-17693909 ] ASF GitHub Bot commented on PARQUET-2230: - gszadovszky commented on PR #1034: U

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693908#comment-17693908 ] ASF GitHub Bot commented on PARQUET-: - pitrou commented on PR #193: URL: ht

[GitHub] [parquet-mr] gszadovszky commented on pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

2023-02-27 Thread via GitHub
gszadovszky commented on PR #1034: URL: https://github.com/apache/parquet-mr/pull/1034#issuecomment-1445971800 @wgtmac, I think it would be nice to inform the cli users as well about the deprecation. There is no harm keeping these commands only that if the related implementation is deprecat

[GitHub] [parquet-format] pitrou commented on pull request #193: PARQUET-2222: RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread via GitHub
pitrou commented on PR #193: URL: https://github.com/apache/parquet-format/pull/193#issuecomment-1445971725 How about also stating it directly in the grammar: ``` // v1 data pages prepend the encoded length... rle-bit-packed-hybrid-v1: length := length of the in bytes stored as

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693906#comment-17693906 ] ASF GitHub Bot commented on PARQUET-2230: - wgtmac commented on PR #1034: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

2023-02-27 Thread via GitHub
wgtmac commented on PR #1034: URL: https://github.com/apache/parquet-mr/pull/1034#issuecomment-1445965822 @gszadovszky Unfortunately, those commands and classes have been released in v1.12.0. I have already annotated `deprecated` to those classes. Do you think it makes sense to deprecate th

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693903#comment-17693903 ] ASF GitHub Bot commented on PARQUET-: - wgtmac commented on PR #193: URL: ht

[GitHub] [parquet-format] wgtmac commented on pull request #193: PARQUET-2222: RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread via GitHub
wgtmac commented on PR #193: URL: https://github.com/apache/parquet-format/pull/193#issuecomment-1445956634 @pitrou @gszadovszky @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693901#comment-17693901 ] ASF GitHub Bot commented on PARQUET-: - wgtmac opened a new pull request, #1

[GitHub] [parquet-format] wgtmac opened a new pull request, #193: PARQUET-2222: RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread via GitHub
wgtmac opened a new pull request, #193: URL: https://github.com/apache/parquet-format/pull/193 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them i

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693891#comment-17693891 ] Antoine Pitrou commented on PARQUET-: - Yes, this is why I've filed this und

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693890#comment-17693890 ] Gang Wu commented on PARQUET-: -- The implementations are consistent between parquet

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693881#comment-17693881 ] Gang Wu commented on PARQUET-: -- I think the reason is that *DataPageHeaderV2* has

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread Xuwei Fu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693874#comment-17693874 ] Xuwei Fu commented on PARQUET-: --- ok, I got it. Previously I found `RLE` format re

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693870#comment-17693870 ] Antoine Pitrou commented on PARQUET-: - > I don't understand. Isn't length t