Re: [I] Deprecate LZ4, introduce new LZ4_RAW [parquet-java]

2025-10-08 Thread via GitHub
pitrou commented on issue #2606: URL: https://github.com/apache/parquet-java/issues/2606#issuecomment-3381371654 It seems like this was done in https://issues.apache.org/jira/browse/PARQUET-2196 . Should this be closed? -- This is an automated message from the Apache Git Service. To respo

Re: [I] Support encrypted files for show bloom filter command [parquet-java]

2025-10-07 Thread via GitHub
gszadovszky closed issue #3338: Support encrypted files for show bloom filter command URL: https://github.com/apache/parquet-java/issues/3338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] spec: remove the longitude wraparound of the geometry type bbox [parquet-format]

2025-10-07 Thread via GitHub
wgtmac commented on PR #526: URL: https://github.com/apache/parquet-format/pull/526#issuecomment-3377463488 It seems that the two proposals from https://lists.apache.org/thread/x9ll3rhg26mngm10cjn74w66ov23grmm may add additional attributes to the geospatial logical types. I'm not sure it ma

Re: [PR] spec: remove the longitude wraparound of the geometry type bbox [parquet-format]

2025-10-07 Thread via GitHub
paleolimbot commented on PR #526: URL: https://github.com/apache/parquet-format/pull/526#issuecomment-3377051058 > However my concern is that Iceberg may come up with a different solution later which diverges from the Parquet spec. I think it's fine to come up with another solution...

Re: [PR] GH-2815: Allow bytestreamsplit available via Hadoop Configuration [parquet-java]

2025-10-06 Thread via GitHub
ArnavBalyan commented on PR #3340: URL: https://github.com/apache/parquet-java/pull/3340#issuecomment-3370238599 Updated title thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] spec: remove the longitude wraparound of the geometry type bbox [parquet-format]

2025-10-06 Thread via GitHub
jiayuasu commented on PR #526: URL: https://github.com/apache/parquet-format/pull/526#issuecomment-3370237894 @wgtmac please review when you have time. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] GH-2607: Improve "null decryptor" exception in column chunk metadata [parquet-java]

2025-10-02 Thread via GitHub
ArnavBalyan commented on PR #3341: URL: https://github.com/apache/parquet-java/pull/3341#issuecomment-3363848523 cc @gszadovszky could you please take a look thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] GH-2815: Allow bytestreamsplit to be configurable through parquet properties [parquet-java]

2025-10-02 Thread via GitHub
ArnavBalyan commented on PR #3340: URL: https://github.com/apache/parquet-java/pull/3340#issuecomment-3363848040 cc @gszadovszky thanks! :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-10-02 Thread via GitHub
wgtmac merged PR #3325: URL: https://github.com/apache/parquet-java/pull/3325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet

Re: [PR] GH-3338: Support encrypted files for Parquet CLI commands [parquet-java]

2025-10-02 Thread via GitHub
ArnavBalyan commented on code in PR #3339: URL: https://github.com/apache/parquet-java/pull/3339#discussion_r2388211157 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowBloomFilterCommand.java: ## @@ -63,14 +67,31 @@ public ShowBloomFilterCommand(Logger console)

[PR] GH-3342: Throw typed exception for Parquet footer error [parquet-java]

2025-10-02 Thread via GitHub
qiyuandong-db opened a new pull request, #3343: URL: https://github.com/apache/parquet-java/pull/3343 ### Rationale for this change Currently, `ParquetFileReader` throws `RuntimeException`s when footer parsing fails. This can be improved by throwing a `ParquetDecodingEx

[I] Throw typed exception for Parquet footer error [parquet-java]

2025-10-02 Thread via GitHub
qiyuandong-db opened a new issue, #3342: URL: https://github.com/apache/parquet-java/issues/3342 ### Describe the enhancement requested Currently, `ParquetFileReader` throws `RuntimeExceptions` when footer parsing fails. This can be improved by throwing a `ParquetDecodingExcept

Re: [I] Add IP address logical type [parquet-java]

2025-09-30 Thread via GitHub
daniel-awake commented on issue #2203: URL: https://github.com/apache/parquet-java/issues/2203#issuecomment-3353979176 In our project, we are rethinking our own storage of IP addresses in Parquet, looking for a representation that might be more upstreamable. Our current implementation

[I] Parquet-cli unable to read variant shredding tests 86 and 126? [parquet-testing]

2025-09-30 Thread via GitHub
scovich opened a new issue, #97: URL: https://github.com/apache/parquet-testing/issues/97 While adding support for variant array unshredding to arrow-rs, I discovered that parquet-cli is unable to correctly read the parquet files for cases 86 and 126, both due to the same index out of bound

Re: [PR] GH-3316: Fix representation type for VariantBuilder decimal [parquet-java]

2025-09-29 Thread via GitHub
gszadovszky merged PR #3335: URL: https://github.com/apache/parquet-java/pull/3335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@pa

Re: [PR] Add backward compat nested file [parquet-testing]

2025-09-29 Thread via GitHub
mapleFU commented on PR #96: URL: https://github.com/apache/parquet-testing/pull/96#issuecomment-3349698094 @wgtmac do we have existing test file for legacy infering? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-29 Thread via GitHub
ArnavBalyan commented on PR #3325: URL: https://github.com/apache/parquet-java/pull/3325#issuecomment-3346392124 cc @wgtmac gentle reminder thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-2836: Support reading pure parquet files with cat [parquet-java]

2025-09-28 Thread via GitHub
gszadovszky merged PR #3332: URL: https://github.com/apache/parquet-java/pull/3332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@pa

[I] Support encrypted files for show bloom filter command [parquet-java]

2025-09-28 Thread via GitHub
ArnavBalyan opened a new issue, #3338: URL: https://github.com/apache/parquet-java/issues/3338 ### Describe the enhancement requested - Currently show bloom filter does not support encrypted parquet files. - Support reading and showing bloom filter metadata for encrypted files

Re: [PR] add reproduction from https://github.com/apache/arrow-rs/issues/1915 [parquet-testing]

2025-09-28 Thread via GitHub
rluvaton closed pull request #95: add reproduction from https://github.com/apache/arrow-rs/issues/1915 URL: https://github.com/apache/parquet-testing/pull/95 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] add reproduction from https://github.com/apache/arrow-rs/issues/1915 [parquet-testing]

2025-09-28 Thread via GitHub
rluvaton opened a new pull request, #95: URL: https://github.com/apache/parquet-testing/pull/95 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] Bump com.diffplug.spotless:spotless-maven-plugin from 2.30.0 to 3.0.0 [parquet-java]

2025-09-27 Thread via GitHub
dependabot[bot] opened a new pull request, #3337: URL: https://github.com/apache/parquet-java/pull/3337 Bumps [com.diffplug.spotless:spotless-maven-plugin](https://github.com/diffplug/spotless) from 2.30.0 to 3.0.0. Release notes Sourced from https://github.com/diffplug/spotless/r

Re: [PR] Bump com.diffplug.spotless:spotless-maven-plugin from 2.30.0 to 2.46.1 [parquet-java]

2025-09-27 Thread via GitHub
dependabot[bot] closed pull request #3257: Bump com.diffplug.spotless:spotless-maven-plugin from 2.30.0 to 2.46.1 URL: https://github.com/apache/parquet-java/pull/3257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Bump com.diffplug.spotless:spotless-maven-plugin from 2.30.0 to 2.46.1 [parquet-java]

2025-09-27 Thread via GitHub
dependabot[bot] commented on PR #3257: URL: https://github.com/apache/parquet-java/pull/3257#issuecomment-3342186916 Superseded by #3337. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] ShowPagesCommand shows wrong information for compressedSize [parquet-java]

2025-09-27 Thread via GitHub
gszadovszky closed issue #3327: ShowPagesCommand shows wrong information for compressedSize URL: https://github.com/apache/parquet-java/issues/3327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] GH-3317: Fix bytes written by VariantBuilder.appendFloat [parquet-java]

2025-09-26 Thread via GitHub
ArnavBalyan commented on PR #3334: URL: https://github.com/apache/parquet-java/pull/3334#issuecomment-3339777193 cc @gszadovszky could you pls take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] GH-3316: Fix representation type for VariantBuilder decimal [parquet-java]

2025-09-26 Thread via GitHub
ArnavBalyan commented on PR #3335: URL: https://github.com/apache/parquet-java/pull/3335#issuecomment-3339777631 cc @gszadovszky could you pls take a look :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[I] Hadoop parquet reader (and parquet-CLI) fails on some files from parquet-testing [parquet-java]

2025-09-26 Thread via GitHub
ccleva opened a new issue, #3336: URL: https://github.com/apache/parquet-java/issues/3336 ### Describe the bug, including details regarding any error messages, version, and platform. Tested using v1.16.0 on openJDK 11 and 17. 1. [nation.dict-malformed.parquet](https://github.c

[PR] GH-3317: Fix bytes written by VariantBuilder.appendFloat [parquet-java]

2025-09-26 Thread via GitHub
ArnavBalyan opened a new pull request, #3334: URL: https://github.com/apache/parquet-java/pull/3334 ### Rationale for this change - Fixed the bug by correcting the bytes written value to 4 in VariantBuilder.appendFloat() - Added test to ensure we only write 5 bytes into the buffer (1

[PR] GH-3316: Fix representation type for VariantBuilder decimal [parquet-java]

2025-09-26 Thread via GitHub
ArnavBalyan opened a new pull request, #3335: URL: https://github.com/apache/parquet-java/pull/3335 ### Rationale for this change - Since VariantBuilder decimal looks at scale for deciding the representation, it can cause low precision numbers to be stored in wide format. - Removed t

Re: [PR] GH-2836: Support reading pure parquet files with cat [parquet-java]

2025-09-25 Thread via GitHub
ArnavBalyan commented on PR #3332: URL: https://github.com/apache/parquet-java/pull/3332#issuecomment-925099 cc @gszadovszky :) thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[PR] GH-3315: Variant binary read does not take length into account [parquet-java]

2025-09-25 Thread via GitHub
jerolba opened a new pull request, #: URL: https://github.com/apache/parquet-java/pull/ ### Rationale for this change Extracting Binary value from Variant, creating the ByteBuffer doesn't consider the length of the Binary element, creating a ByteBuffer with the size of the re

Re: [PR] Add a proposal process [parquet-format]

2025-09-25 Thread via GitHub
julienledem merged PR #513: URL: https://github.com/apache/parquet-format/pull/513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@pa

Re: [PR] GH-3327: Bug fix incorrect compressed size reported by DataPageV1 [parquet-java]

2025-09-25 Thread via GitHub
ArnavBalyan commented on PR #3326: URL: https://github.com/apache/parquet-java/pull/3326#issuecomment-3323608102 cc @gszadovszky could you PTAL :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] GH-2836: Support parquet only files in cat as fallback [parquet-java]

2025-09-25 Thread via GitHub
ArnavBalyan opened a new pull request, #3332: URL: https://github.com/apache/parquet-java/pull/3332 ### Rationale for this change - Parquet cat fails for files with hyphens since cat uses avro reader by default which has stricter rules. - Ensure we can still read pure parquet files,

Re: [I] Problem with a cat [parquet-java]

2025-09-25 Thread via GitHub
ArnavBalyan commented on issue #2836: URL: https://github.com/apache/parquet-java/issues/2836#issuecomment-970073 Thanks for reporting, support will be added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-25 Thread via GitHub
ArnavBalyan commented on code in PR #3325: URL: https://github.com/apache/parquet-java/pull/3325#discussion_r2378785189 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestColumnIndexFiltering.java: ## @@ -650,4 +660,75 @@ public void testFilteringWithProjection() thro

Re: [PR] GH-3327: Bug fix incorrect compressed size reported by DataPageV1 [parquet-java]

2025-09-25 Thread via GitHub
gszadovszky merged PR #3326: URL: https://github.com/apache/parquet-java/pull/3326 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@pa

Re: [PR] PARQUET-2012 Mark ProtoParquetWriter constructors deprecated [parquet-java]

2025-09-24 Thread via GitHub
dossett commented on code in PR #886: URL: https://github.com/apache/parquet-java/pull/886#discussion_r2375846658 ## parquet-protobuf/src/test/java/org/apache/parquet/proto/TestUtils.java: ## @@ -198,14 +197,11 @@ private static void checkSameBuilderInstance(MessageOrBuilder[]

Re: [I] Track page read metrics on parquet reader [parquet-java]

2025-09-24 Thread via GitHub
wgtmac closed issue #3331: Track page read metrics on parquet reader URL: https://github.com/apache/parquet-java/issues/3331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-3331: Track Column index page skip statistics during file read [parquet-java]

2025-09-24 Thread via GitHub
wgtmac commented on PR #3330: URL: https://github.com/apache/parquet-java/pull/3330#issuecomment-3332342597 Thanks @ArnavBalyan! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[I] Track page read metrics on parquet reader [parquet-java]

2025-09-24 Thread via GitHub
ArnavBalyan opened a new issue, #3331: URL: https://github.com/apache/parquet-java/issues/3331 ### Describe the enhancement requested Track pages read/pages skipped on parquet reader due to column index ### Component(s) _No response_ -- This is an automated message fro

Re: [PR] Fix typos in VariantShredding [parquet-format]

2025-09-24 Thread via GitHub
emkornfield commented on code in PR #523: URL: https://github.com/apache/parquet-format/pull/523#discussion_r2377686001 ## VariantShredding.md: ## @@ -65,12 +65,12 @@ The series of measurements `34, null, "n/a", 100` would be stored as: Both `value` and `typed_value` are optio

Re: [PR] Fix typos in VariantShredding [parquet-format]

2025-09-24 Thread via GitHub
emkornfield commented on code in PR #523: URL: https://github.com/apache/parquet-format/pull/523#discussion_r2377681966 ## VariantShredding.md: ## @@ -38,7 +38,7 @@ All field names of a Variant, whether shredded or not, must be present in the me ## Value Shredding -Variant

Re: [PR] MINOR: Track Column index page skip statistics during file read [parquet-java]

2025-09-24 Thread via GitHub
wgtmac commented on PR #3330: URL: https://github.com/apache/parquet-java/pull/3330#issuecomment-3331372005 I think this is worth a github issue :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-24 Thread via GitHub
ArnavBalyan commented on code in PR #3325: URL: https://github.com/apache/parquet-java/pull/3325#discussion_r2376155381 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestColumnIndexFiltering.java: ## @@ -650,4 +660,75 @@ public void testFilteringWithProjection() thro

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-24 Thread via GitHub
ArnavBalyan commented on code in PR #3325: URL: https://github.com/apache/parquet-java/pull/3325#discussion_r2376155381 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestColumnIndexFiltering.java: ## @@ -650,4 +660,75 @@ public void testFilteringWithProjection() thro

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-24 Thread via GitHub
ArnavBalyan commented on code in PR #3325: URL: https://github.com/apache/parquet-java/pull/3325#discussion_r2376152332 ## parquet-column/src/main/java/org/apache/parquet/internal/filter2/columnindex/ColumnIndexFilter.java: ## @@ -220,4 +225,35 @@ public RowRanges visit(Not not)

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-24 Thread via GitHub
ArnavBalyan commented on code in PR #3325: URL: https://github.com/apache/parquet-java/pull/3325#discussion_r2376149247 ## parquet-column/src/main/java/org/apache/parquet/internal/filter2/columnindex/ColumnIndexFilter.java: ## @@ -220,4 +225,35 @@ public RowRanges visit(Not not)

[PR] MINOR: parquet-avro tests should not debug to stderr [parquet-java]

2025-09-24 Thread via GitHub
dossett opened a new pull request, #3329: URL: https://github.com/apache/parquet-java/pull/3329 ### Rationale for this change Logging debug information to stderr creates noise, especially when trying to run tests with maven's quite mode (`-q`). stderr is sometime appealing becaus

[I] ParquetWriter is incompatible with JDK25 [parquet-java]

2025-09-24 Thread via GitHub
hugoncosta opened a new issue, #3328: URL: https://github.com/apache/parquet-java/issues/3328 ### Describe the bug, including details regarding any error messages, version, and platform. Hello, I'm trying to upgrade my code base to JDK25 and I'm having this issue on my tests `

Re: [PR] Add a proposal process [parquet-format]

2025-09-24 Thread via GitHub
alamb commented on code in PR #513: URL: https://github.com/apache/parquet-format/pull/513#discussion_r2375230976 ## proposals/1_BASE64_ENCODING.md: ## @@ -0,0 +1,42 @@ + +--- +Author: Julien Le Dem +Created: 2025-Aug-7 +Name: add BASE64 compression +Issue: https://github.com/ap

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-24 Thread via GitHub
wgtmac commented on code in PR #3325: URL: https://github.com/apache/parquet-java/pull/3325#discussion_r2375001325 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestColumnIndexFiltering.java: ## @@ -650,4 +660,75 @@ public void testFilteringWithProjection() throws I

Re: [PR] GH-2891: Include actual values in validation error messages and improve logging [parquet-java]

2025-09-23 Thread via GitHub
wgtmac merged PR #3319: URL: https://github.com/apache/parquet-java/pull/3319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet

Re: [PR] GH-3312: Support uuid read converter for parquet thrift [parquet-java]

2025-09-23 Thread via GitHub
wgtmac merged PR #3313: URL: https://github.com/apache/parquet-java/pull/3313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet

Re: [PR] GH-3320: Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-23 Thread via GitHub
gszadovszky commented on code in PR #3325: URL: https://github.com/apache/parquet-java/pull/3325#discussion_r2374493098 ## parquet-column/src/main/java/org/apache/parquet/internal/filter2/columnindex/ColumnIndexFilter.java: ## @@ -220,4 +225,35 @@ public RowRanges visit(Not not)

Re: [PR] GH-3327: Bug fix incorrect compressed size reported by DataPageV1 [parquet-java]

2025-09-23 Thread via GitHub
gszadovszky commented on PR #3326: URL: https://github.com/apache/parquet-java/pull/3326#issuecomment-3326642645 @ArnavBalyan, I have a feeling that this change breaks the contract of `Page`. In `ColumnChunkPageReadStore` we actually decompress (and decrypt) the data, so the fact that the c

Re: [PR] GH-3312: Support uuid read converter for parquet thrift [parquet-java]

2025-09-23 Thread via GitHub
ArnavBalyan commented on PR #3313: URL: https://github.com/apache/parquet-java/pull/3313#issuecomment-3325047733 cc @wgtmac can this be merged if all looks good thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] GH-2891: Include actual values in validation error messages and improve logging [parquet-java]

2025-09-23 Thread via GitHub
ArnavBalyan commented on PR #3319: URL: https://github.com/apache/parquet-java/pull/3319#issuecomment-3325048795 cc @wgtmac could this be merged if all looks good thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] Bug fix incorrect compressed size reported by datapagev1 reader [parquet-java]

2025-09-23 Thread via GitHub
ArnavBalyan opened a new pull request, #3326: URL: https://github.com/apache/parquet-java/pull/3326 ### Rationale for this change - Currently `ColumnChunkPageReadStore` sets decompressed bytes when creating datapagev1. - However DataPageV1 assumes the compressed bytes to be extracted

[PR] [GH-3320] Ensure parquet reader does not fail due to incorrect statistics [parquet-java]

2025-09-23 Thread via GitHub
ArnavBalyan opened a new pull request, #3325: URL: https://github.com/apache/parquet-java/pull/3325 ### Rationale for this change - Parquet reader fails if upstream writes wrong statistics in column index vs offset index. - Ensure parquet java can still read the file and ignore corr

Re: [PR] [parquet-thrift] start removing things marked for deprecation [parquet-java]

2025-09-23 Thread via GitHub
gszadovszky commented on PR #3318: URL: https://github.com/apache/parquet-java/pull/3318#issuecomment-3322812366 I'm not sure about this error either. But right, when we actually switch master for 2.0 development, we should disable japicmp until the first release. Or maybe it is intelligent

Re: [PR] MINOR: Remove unused parquet-thrift dependencies [parquet-java]

2025-09-22 Thread via GitHub
dossett commented on PR #3323: URL: https://github.com/apache/parquet-java/pull/3323#issuecomment-3321324842 Missed one test/provided scope. fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] MINOR: Remove unused parquet-thrift dependencies [parquet-java]

2025-09-22 Thread via GitHub
dossett commented on PR #3323: URL: https://github.com/apache/parquet-java/pull/3323#issuecomment-3320528348 Learned that the project requires(?) some dependency declarations above and beyond what's required to build. Also sorted out other unused dependencies. 🤞 tests will pass -- This

Re: [PR] GH-3321 Exclude package-info.class from shaded fastutil dependency [parquet-java]

2025-09-22 Thread via GitHub
jerolba commented on PR #3322: URL: https://github.com/apache/parquet-java/pull/3322#issuecomment-3320020272 I have found it just from fastutil dependency. I have no strong opinion on whether to explicitly manage this from fastutil or to generalize the rule for future changes. -- T

Re: [PR] GH-3321 Exclude package-info.class from shaded fastutil dependency [parquet-java]

2025-09-22 Thread via GitHub
Fokko commented on PR #3322: URL: https://github.com/apache/parquet-java/pull/3322#issuecomment-3319985983 I agree that it doesn't make sense to shade `package-info.class`, maybe we should completely remove it: ```xml *:* **/package-info.class

[PR] MINOR: Remove unused parquet-thrift dependencies [parquet-java]

2025-09-22 Thread via GitHub
dossett opened a new pull request, #3323: URL: https://github.com/apache/parquet-java/pull/3323 ### Rationale for this change There are some unnecessary dependencies in parquet-thrift. Comments for a couple of them suggest that they're still needed for pig integration, but my lo

Re: [PR] [parquet-thrift] start removing things marked for deprecation [parquet-java]

2025-09-22 Thread via GitHub
dossett commented on PR #3318: URL: https://github.com/apache/parquet-java/pull/3318#issuecomment-3319860244 I don't understand this build error: ``` Error: Failed to execute goal com.github.siom79.japicmp:japicmp-maven-plugin:0.23.1:cmp (default) on project parquet-thrift: Execu

Re: [I] Parquet read fails in certain scenarios for files written by parquet-go [parquet-java]

2025-09-22 Thread via GitHub
ArnavBalyan commented on issue #3320: URL: https://github.com/apache/parquet-java/issues/3320#issuecomment-3318419912 Yes agreed, let me see if there is a clean way to do this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Parquet read fails in certain scenarios for files written by parquet-go [parquet-java]

2025-09-21 Thread via GitHub
ArnavBalyan commented on issue #3320: URL: https://github.com/apache/parquet-java/issues/3320#issuecomment-3316759277 Looks like it's one of the several open source implementations of parquet-go. the upstream is buggy, will check thanks for taking a look @wgtmac @zeroshade. I think it may b

Re: [I] Parquet read fails in certain scenarios for files written by parquet-go [parquet-java]

2025-09-21 Thread via GitHub
zeroshade commented on issue #3320: URL: https://github.com/apache/parquet-java/issues/3320#issuecomment-3316066920 This is not a known issue that I'm aware of. Importantly, is this github.com/parquet-go/parquet-go? Or is this the Go library for parquet in the arrow-go repo? If it's

Re: [I] Parquet read fails in certain scenarios for files written by parquet-go [parquet-java]

2025-09-21 Thread via GitHub
wgtmac commented on issue #3320: URL: https://github.com/apache/parquet-java/issues/3320#issuecomment-3316041432 Is this a known issue of Parquet-go? cc @zeroshade -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] GH-2972: Fix incomplete avro metadata on INT96 schema converter [parquet-java]

2025-09-21 Thread via GitHub
wgtmac commented on PR #3311: URL: https://github.com/apache/parquet-java/pull/3311#issuecomment-3316007024 Merged it. Thanks @ArnavBalyan for working on this and @gszadovszky for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] GH-2972: Fix incomplete avro metadata on INT96 schema converter [parquet-java]

2025-09-21 Thread via GitHub
wgtmac merged PR #3311: URL: https://github.com/apache/parquet-java/pull/3311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet

Re: [I] AvroSchemaConverter toString() conversion when schema has multiple INT96 fields [parquet-java]

2025-09-21 Thread via GitHub
wgtmac closed issue #2972: AvroSchemaConverter toString() conversion when schema has multiple INT96 fields URL: https://github.com/apache/parquet-java/issues/2972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] GH-3321 Exclude package-info.class from shaded fastutil dependency [parquet-java]

2025-09-21 Thread via GitHub
jerolba opened a new pull request, #3322: URL: https://github.com/apache/parquet-java/pull/3322 ### Rationale for this change I've noticed that some JARs include `package-info.class` files from the shaded `it.unimi.dsi:fastutil` dependency. These files are located in packages t

[I] Exclude package-info.class from shaded fastutil dependency [parquet-java]

2025-09-21 Thread via GitHub
jerolba opened a new issue, #3321: URL: https://github.com/apache/parquet-java/issues/3321 ### Describe the enhancement requested I've noticed that some JARs include `package-info.class` files from the shaded `it.unimi.dsi:fastutil` dependency. These files are located in packag

Re: [I] Clean up stale jira references from codebase [parquet-java]

2025-09-20 Thread via GitHub
wgtmac closed issue #3310: Clean up stale jira references from codebase URL: https://github.com/apache/parquet-java/issues/3310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] GH-2972: Fix incomplete avro metadata on INT96 schema converter [parquet-java]

2025-09-20 Thread via GitHub
ArnavBalyan commented on PR #3311: URL: https://github.com/apache/parquet-java/pull/3311#issuecomment-3269960847 cc @gszadovszky : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Bump to Java 11 [parquet-java]

2025-09-20 Thread via GitHub
Fokko commented on code in PR #3314: URL: https://github.com/apache/parquet-java/pull/3314#discussion_r2354650423 ## pom.xml: ## @@ -69,8 +69,8 @@ -1.8 -1.8 +11 +11 Review Comment: Good call, updated 👍 -- This is an automated message from the A

Re: [PR] GH-3242: Emit and Read min/max statistics for int96 timestamp columns [parquet-java]

2025-09-20 Thread via GitHub
rahulketch commented on code in PR #3243: URL: https://github.com/apache/parquet-java/pull/3243#discussion_r2333932592 ## parquet-column/src/main/java/org/apache/parquet/ValidInt96Stats.java: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] MINOR: Correct the EventObject in example for VariantShredding [parquet-format]

2025-09-20 Thread via GitHub
klion26 commented on PR #522: URL: https://github.com/apache/parquet-format/pull/522#issuecomment-3264954736 @wgtmac thanks for the review, I've addressed the comment, please take another look, thanks. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] GH-519: [Variant] Disambiguate SQL NULL (missing) from Variant null [parquet-format]

2025-09-19 Thread via GitHub
Tishj commented on code in PR #520: URL: https://github.com/apache/parquet-format/pull/520#discussion_r2336428309 ## VariantShredding.md: ## @@ -42,7 +42,31 @@ Variant values are stored in Parquet fields named `value`. Each `value` field may have an associated shredded field na

[PR] GH-2891: Include actual values in validation error messages and improve logging [parquet-java]

2025-09-19 Thread via GitHub
ArnavBalyan opened a new pull request, #3319: URL: https://github.com/apache/parquet-java/pull/3319 ### Rationale for this change - User reported hard to debug due to lack of details in the precondition checks. - Add actual values and more details in the error messages throughout th

Re: [PR] [TEST] [parquet-thrift] start removing things marked for deprecation [parquet-java]

2025-09-19 Thread via GitHub
dossett commented on PR #3318: URL: https://github.com/apache/parquet-java/pull/3318#issuecomment-3312818606 Thank you for running the tests @gszadovszky , I was able to track down the problems and find some more things to remove. -- This is an automated message from the Apache Git Servic

Re: [PR] GH-2972: Fix incomplete avro metadata on INT96 schema converter [parquet-java]

2025-09-19 Thread via GitHub
ArnavBalyan commented on PR #3311: URL: https://github.com/apache/parquet-java/pull/3311#issuecomment-3313009641 cc @gszadovszky could you please merge it if all looks good thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[I] Parquet read fails when written by parquet-go [parquet-java]

2025-09-19 Thread via GitHub
ArnavBalyan opened a new issue, #3320: URL: https://github.com/apache/parquet-java/issues/3320 ### Describe the enhancement requested Parquet-go can write extra entries in the column index for 1 column chunk. This causes Parquet java to fail with java.lang.ArrayIndexOutOfBoundsExcepti

Re: [I] Parquet read fails in certain scenarios for files written by parquet-go [parquet-java]

2025-09-19 Thread via GitHub
ArnavBalyan commented on issue #3320: URL: https://github.com/apache/parquet-java/issues/3320#issuecomment-3312865144 cc @gszadovszky just wanted to know your thoughts. Ignoring column index and reading seems to avoid the issue but may silently introduce a dependency on incorrect behaviour

Re: [PR] [TEST] [parquet-thrift] start removing things marked for deprecation [parquet-java]

2025-09-19 Thread via GitHub
dossett commented on PR #3318: URL: https://github.com/apache/parquet-java/pull/3318#issuecomment-3312724925 Pushed several test fixes and removed some unneeded dependencies -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [TEST] [parquet-thrift] start removing things marked for deprecation [parquet-java]

2025-09-19 Thread via GitHub
dossett commented on code in PR #3318: URL: https://github.com/apache/parquet-java/pull/3318#discussion_r2363403848 ## parquet-thrift/src/test/java/org/apache/parquet/hadoop/thrift/TestParquetToThriftReadWriteAndProjection.java: ## @@ -333,7 +196,7 @@ public void testPullInPrimi

Re: [I] Include scale and precision in error message when scale > precision [parquet-java]

2025-09-19 Thread via GitHub
ArnavBalyan commented on issue #2891: URL: https://github.com/apache/parquet-java/issues/2891#issuecomment-3312681117 Thanks for raising, will raise a fix for improved debugging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [TEST] [parquet-thrift] start removing things marked for deprecation [parquet-java]

2025-09-19 Thread via GitHub
dossett commented on PR #3318: URL: https://github.com/apache/parquet-java/pull/3318#issuecomment-3312130546 Some build errors I'll chase down when 2.0.0 is happening -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] GH-3242: Emit and Read min/max statistics for int96 timestamp columns [parquet-java]

2025-09-18 Thread via GitHub
rahulketch commented on code in PR #3243: URL: https://github.com/apache/parquet-java/pull/3243#discussion_r2333952664 ## parquet-column/src/main/java/org/apache/parquet/ValidInt96Stats.java: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[PR] [TEST] [parquet-thrift] start removing things marked for deprecation [parquet-java]

2025-09-18 Thread via GitHub
dossett opened a new pull request, #3318: URL: https://github.com/apache/parquet-java/pull/3318 A 2.0.0 release is being discussed so I took a pass at removing everything marked for deprecation in `parquet-thrift`. Some of the deprecations have been there for 10 years. -- This is an auto

Re: [I] VariantBuilder.appendFloat writes too many bytes [parquet-java]

2025-09-18 Thread via GitHub
gszadovszky commented on issue #3317: URL: https://github.com/apache/parquet-java/issues/3317#issuecomment-3307518314 cc @gene-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[I] VariantBuilder.appendFloat writes too many bytes [parquet-java]

2025-09-18 Thread via GitHub
gszadovszky opened a new issue, #3317: URL: https://github.com/apache/parquet-java/issues/3317 ### Describe the bug, including details regarding any error messages, version, and platform. [VariantBuilder.appendFloat](https://github.com/apache/parquet-java/blob/apache-parquet-1.16.0/p

Re: [I] VariantBuilder does not choose the tightest decimal type possible [parquet-java]

2025-09-18 Thread via GitHub
gszadovszky commented on issue #3316: URL: https://github.com/apache/parquet-java/issues/3316#issuecomment-3307187904 cc @gene-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[I] VariantBuilder does not choose the tightest decimal type possible [parquet-java]

2025-09-18 Thread via GitHub
gszadovszky opened a new issue, #3316: URL: https://github.com/apache/parquet-java/issues/3316 ### Describe the bug, including details regarding any error messages, version, and platform. The class `VariantBuilder` checks not only the precision of the related value but the scale as w

Re: [PR] GH-3242: Emit and Read min/max statistics for int96 timestamp columns [parquet-java]

2025-09-17 Thread via GitHub
rahulketch commented on code in PR #3243: URL: https://github.com/apache/parquet-java/pull/3243#discussion_r2333918746 ## parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java: ## @@ -762,12 +772,12 @@ public List convertEncodingStats(En

Re: [I] Variant binary read does not take length into account [parquet-java]

2025-09-17 Thread via GitHub
gszadovszky commented on issue #3315: URL: https://github.com/apache/parquet-java/issues/3315#issuecomment-3305531981 @gene-db, could you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

  1   2   3   4   5   6   7   8   9   10   >