[jira] [Created] (PARQUET-2408) Fix license header in .gitattributes

2023-12-04 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2408: Summary: Fix license header in .gitattributes Key: PARQUET-2408 URL: https://issues.apache.org/jira/browse/PARQUET-2408 Project: Parquet Issue Type: Bug

[jira] [Created] (PARQUET-2407) Add custom .asf.yaml for finer-grained control of email notifications

2023-12-04 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2407: Summary: Add custom .asf.yaml for finer-grained control of email notifications Key: PARQUET-2407 URL: https://issues.apache.org/jira/browse/PARQUET-2407 Project: Parquet

[jira] [Resolved] (PARQUET-2385) Don't initialize CodecFactory in ParquetWriter

2023-12-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2385. -- Fix Version/s: 1.14.0 Assignee: Atour Mousavi Gourabi Resolution: Fixed > Don't

[jira] [Resolved] (PARQUET-2400) Update Spotless command in PR prompt to include vector plugins

2023-12-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2400. -- Fix Version/s: 1.14.0 Assignee: Atour Mousavi Gourabi Resolution: Fixed > Update

[jira] [Resolved] (PARQUET-2386) More consistent code style in parquet-mr

2023-11-30 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2386. -- Fix Version/s: 1.14.0 Resolution: Fixed > More consistent code style in parquet-mr >

[jira] [Resolved] (PARQUET-2383) Bump parquet-format to 2.10.0

2023-11-21 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2383. -- Fix Version/s: 1.14.0 Resolution: Fixed > Bump parquet-format to 2.10.0 >

[jira] [Commented] (PARQUET-2378) Problem with a cat

2023-11-21 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788344#comment-17788344 ] Gang Wu commented on PARQUET-2378: -- Sorry for the late reply. I'm not sure if it is a good idea to add

[jira] [Created] (PARQUET-2383) Bump parquet-format to 2.10.0

2023-11-20 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2383: Summary: Bump parquet-format to 2.10.0 Key: PARQUET-2383 URL: https://issues.apache.org/jira/browse/PARQUET-2383 Project: Parquet Issue Type: Improvement

[jira] [Resolved] (PARQUET-2380) Decouple RewriteOptions from Hadoop classes

2023-11-20 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2380. -- Fix Version/s: 1.14.0 Assignee: Atour Mousavi Gourabi Resolution: Fixed > Decouple

[jira] [Resolved] (PARQUET-2375) Extend vectorized bit unpacking benchmark for various bit sizes.

2023-11-16 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2375. -- Fix Version/s: 1.14.0 Assignee: JATIN BHATEJA Resolution: Fixed > Extend vectorized

[jira] [Commented] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787036#comment-17787036 ] Gang Wu commented on PARQUET-2378: -- Can we get rid of the schema conversion via AvroSchemaConverter?

[jira] [Commented] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786777#comment-17786777 ] Gang Wu commented on PARQUET-2378: -- Thanks for reporting the issue! I can reproduce it on my end. Let

[jira] [Resolved] (PARQUET-2379) [Format] Update changelog for 2.10.0

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2379. -- Fix Version/s: format-2.10.0 Resolution: Fixed > [Format] Update changelog for 2.10.0 >

[jira] [Updated] (PARQUET-2221) [Format] Encoding spec incorrect for dictionary fallback

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2221: - Fix Version/s: (was: format-2.10.0) > [Format] Encoding spec incorrect for dictionary fallback >

[jira] [Resolved] (PARQUET-2313) Bump actions/setup-java from 1 to 3

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2313. -- Assignee: Gang Wu Resolution: Fixed > Bump actions/setup-java from 1 to 3 >

[jira] [Resolved] (PARQUET-2344) Bump to Thirft 0.19.0

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2344. -- Resolution: Fixed > Bump to Thirft 0.19.0 > - > > Key:

[jira] [Resolved] (PARQUET-2287) Bump maven-shade-plugin from 2.2 to 3.4.1

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2287. -- Fix Version/s: format-2.10.0 (was: 1.14.0) Resolution: Fixed > Bump

[jira] [Resolved] (PARQUET-2286) Bump apache-rat-plugin from 0.12 to 0.15

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2286. -- Fix Version/s: format-2.10.0 (was: 1.14.0) Resolution: Fixed > Bump

[jira] [Resolved] (PARQUET-2285) Add dependabot

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2285. -- Fix Version/s: format-2.10.0 (was: 1.14.0) Resolution: Fixed > Add

[jira] [Resolved] (PARQUET-2284) Bump junit from 4.10 to 4.13.1

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2284. -- Fix Version/s: format-2.10.0 (was: 1.14.0) Resolution: Fixed > Bump

[jira] [Resolved] (PARQUET-2270) Bump Thrift to 0.18.1

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2270. -- Resolution: Fixed > Bump Thrift to 0.18.1 > - > > Key:

[jira] [Resolved] (PARQUET-2271) Bump Parquet POM to 29

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2271. -- Resolution: Fixed > Bump Parquet POM to 29 > -- > > Key:

[jira] [Updated] (PARQUET-2005) Upgrade thrift to 0.14.1

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2005: - Fix Version/s: format-2.10.0 > Upgrade thrift to 0.14.1 > > >

[jira] [Resolved] (PARQUET-2369) Clarify Support for Pages Compressed with Multiple GZIP Members

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2369. -- Assignee: Raphael Taylor-Davies Resolution: Fixed > Clarify Support for Pages Compressed with

[jira] [Assigned] (PARQUET-2264) Update specification to allow DecimalType scale == precision

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2264: Assignee: Devin Smith > Update specification to allow DecimalType scale == precision >

[jira] [Resolved] (PARQUET-2264) Update specification to allow DecimalType scale == precision

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2264. -- Fix Version/s: format-2.10.0 Resolution: Fixed > Update specification to allow DecimalType

[jira] [Updated] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2241: - Fix Version/s: format-2.10.0 > ByteStreamSplitDecoder broken in presence of nulls >

[jira] [Resolved] (PARQUET-2215) Document how DELTA_BINARY_PACKED handles overflow for deltas

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2215. -- Fix Version/s: format-2.10.0 Resolution: Fixed > Document how DELTA_BINARY_PACKED handles

[jira] [Updated] (PARQUET-2215) Document how DELTA_BINARY_PACKED handles overflow for deltas

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2215: - Issue Type: Improvement (was: New Feature) > Document how DELTA_BINARY_PACKED handles overflow for

[jira] [Updated] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2257: - Issue Type: Improvement (was: New Feature) > [Format] Add bloom_filter_length to ColumnMetaData >

[jira] [Updated] (PARQUET-2261) [Format] Add statistics that reflect decoded size to metadata

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2261: - Issue Type: New Feature (was: Improvement) > [Format] Add statistics that reflect decoded size to

[jira] [Updated] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-758: Issue Type: New Feature (was: Improvement) > [Format] HALF precision FLOAT Logical type >

[jira] [Updated] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-11-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-758: Fix Version/s: format-2.10.0 > [Format] HALF precision FLOAT Logical type >

[jira] [Created] (PARQUET-2379) [Format] Update changelog for 2.10.0

2023-11-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2379: Summary: [Format] Update changelog for 2.10.0 Key: PARQUET-2379 URL: https://issues.apache.org/jira/browse/PARQUET-2379 Project: Parquet Issue Type: Task

[jira] [Resolved] (PARQUET-2261) [Format] Add statistics that reflect decoded size to metadata

2023-11-14 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2261. -- Fix Version/s: format-2.10.0 Resolution: Fixed > [Format] Add statistics that reflect decoded

[jira] [Resolved] (PARQUET-2359) Simple Parquet Configuration implementation

2023-11-14 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2359. -- Fix Version/s: 1.14.0 Assignee: Atour Mousavi Gourabi Resolution: Fixed > Simple

[jira] [Resolved] (PARQUET-2372) Avoid unnecessary reading of RowGroup data during rewriting

2023-11-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2372. -- Fix Version/s: 1.14.0 Assignee: Xianyang Liu Resolution: Fixed > Avoid unnecessary

[jira] [Resolved] (PARQUET-2365) Fixes NPE when rewriting column without column index

2023-11-04 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2365. -- Fix Version/s: 1.14.0 Assignee: Xianyang Liu Resolution: Fixed > Fixes NPE when

[jira] [Resolved] (PARQUET-2371) Resolve japicmp failure for CI

2023-11-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2371. -- Fix Version/s: 1.14.0 Assignee: Atour Mousavi Gourabi Resolution: Fixed > Resolve

[jira] [Resolved] (PARQUET-2366) Optimize random seek during rewriting

2023-10-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2366. -- Fix Version/s: 1.14.0 Assignee: Xianyang Liu Resolution: Fixed > Optimize random

[jira] [Resolved] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

2023-10-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2347. -- Resolution: Fixed > Add interface layer between Parquet and Hadoop Configuration >

[jira] [Assigned] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

2023-10-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2347: Assignee: Atour Mousavi Gourabi > Add interface layer between Parquet and Hadoop Configuration

[jira] [Updated] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

2023-10-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2347: - Fix Version/s: 1.14.0 > Add interface layer between Parquet and Hadoop Configuration >

[jira] [Resolved] (PARQUET-2361) Reduce failure rate of unit test testParquetFileWithBloomFilterWithFpp

2023-10-18 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2361. -- Fix Version/s: 1.14.0 Assignee: Feng Jiajie Resolution: Fixed > Reduce failure rate

[jira] [Resolved] (PARQUET-2352) Update parquet format spec to allow truncation of row group min/max stats

2023-10-18 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2352. -- Fix Version/s: format-2.10.0 Assignee: Raunaq Morarka Resolution: Fixed > Update

[jira] [Commented] (PARQUET-2367) NegativeArraySizeException on read for parquet files written with large strings in some cases

2023-10-17 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776453#comment-17776453 ] Gang Wu commented on PARQUET-2367: -- Thanks for reporting this! I see the configs involve writing.

[jira] [Resolved] (PARQUET-2363) ParquetRewriter should encrypt the V2 page header

2023-10-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2363. -- Fix Version/s: 1.14.0 Assignee: Xianyang Liu Resolution: Fixed > ParquetRewriter

[jira] [Resolved] (PARQUET-2357) Modest refactor of CapacityByteArrayOutputStream

2023-10-14 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2357. -- Assignee: Feng Jiajie Resolution: Fixed > Modest refactor of CapacityByteArrayOutputStream >

[jira] [Resolved] (PARQUET-2362) Clarify parquet encoding

2023-10-14 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2362. -- Fix Version/s: format-2.10.0 Assignee: Letian Jiang Resolution: Fixed > Clarify

[jira] [Assigned] (PARQUET-2348) Recompression/Re-encrypt should rewrite bloomfilter

2023-10-11 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2348: Assignee: Xianyang Liu > Recompression/Re-encrypt should rewrite bloomfilter >

[jira] [Resolved] (PARQUET-2348) Recompression/Re-encrypt should rewrite bloomfilter

2023-10-11 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2348. -- Fix Version/s: 1.14.0 Resolution: Fixed > Recompression/Re-encrypt should rewrite bloomfilter

[jira] [Resolved] (PARQUET-2358) Upgrade japicmp-maven-plugin to 0.16.0

2023-10-11 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2358. -- Resolution: Fixed > Upgrade japicmp-maven-plugin to 0.16.0 > --

[jira] [Resolved] (PARQUET-2349) Move from deprecated BytesCompressor/Decompressor to BytesInputCompressor/Decompressor

2023-10-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2349. -- Fix Version/s: 1.14.0 Resolution: Fixed > Move from deprecated BytesCompressor/Decompressor

[jira] [Assigned] (PARQUET-2349) Move from deprecated BytesCompressor/Decompressor to BytesInputCompressor/Decompressor

2023-10-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2349: Assignee: Atour Mousavi Gourabi > Move from deprecated BytesCompressor/Decompressor to >

[jira] [Resolved] (PARQUET-2354) Apparent race condition in CharsetValidator

2023-09-28 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2354. -- Resolution: Fixed > Apparent race condition in CharsetValidator >

[jira] [Assigned] (PARQUET-2354) Apparent race condition in CharsetValidator

2023-09-28 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2354: Assignee: Piotr Findeisen > Apparent race condition in CharsetValidator >

[jira] [Updated] (PARQUET-2354) Apparent race condition in CharsetValidator

2023-09-28 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2354: - Fix Version/s: 1.14.0 > Apparent race condition in CharsetValidator >

[jira] [Commented] (PARQUET-2352) Update parquet format spec to allow truncation of row group min/max stats

2023-09-19 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766995#comment-17766995 ] Gang Wu commented on PARQUET-2352: -- Thanks for opening the issue! Format change is not an easy topic

[jira] [Closed] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu closed PARQUET-2346. > Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9 > - > >

[jira] [Resolved] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2346. -- Resolution: Won't Do > Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9 >

[jira] [Commented] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764933#comment-17764933 ] Gang Wu commented on PARQUET-2346: -- Thanks for the information! Probably it is not a good time to make

[jira] [Commented] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764740#comment-17764740 ] Gang Wu commented on PARQUET-2346: -- Do you have any suggestion? TBH, I am not familiar with this issue

[jira] [Created] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-11 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2346: Summary: Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9 Key: PARQUET-2346 URL: https://issues.apache.org/jira/browse/PARQUET-2346 Project: Parquet Issue Type:

[jira] [Commented] (PARQUET-2345) The Parquet Spec doesn't specify whether multiple columns are allowed to have the same name.

2023-09-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763125#comment-17763125 ] Gang Wu commented on PARQUET-2345: -- I didn't find any statement to disallow identical field names in

[jira] [Updated] (PARQUET-2343) Fixes NPE when rewriting file with multiple rowgroups

2023-09-07 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2343: - Fix Version/s: 1.13.2 > Fixes NPE when rewriting file with multiple rowgroups >

[jira] [Resolved] (PARQUET-2343) Fixes NPE when rewriting file with multiple rowgroups

2023-09-07 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2343. -- Resolution: Fixed > Fixes NPE when rewriting file with multiple rowgroups >

[jira] [Assigned] (PARQUET-2344) Bump to Thirft 0.19.0

2023-09-04 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2344: Assignee: Fokko Driesprong > Bump to Thirft 0.19.0 > - > >

[jira] [Updated] (PARQUET-2344) Bump to Thirft 0.19.0

2023-09-04 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2344: - Fix Version/s: format-2.10.0 > Bump to Thirft 0.19.0 > - > > Key:

[jira] [Assigned] (PARQUET-2343) Fixes NPE when rewriting file with multiple rowgroups

2023-09-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2343: Assignee: Xianyang Liu > Fixes NPE when rewriting file with multiple rowgroups >

[jira] [Updated] (PARQUET-2343) Fixes NPE when rewriting file with multiple rowgroups

2023-09-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2343: - Fix Version/s: 1.14.0 > Fixes NPE when rewriting file with multiple rowgroups >

[jira] [Resolved] (PARQUET-2342) Parquet writer produced a corrupted file due to page value count overflow

2023-08-31 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2342. -- Fix Version/s: 1.14.0 Resolution: Fixed > Parquet writer produced a corrupted file due to

[jira] [Assigned] (PARQUET-2342) Parquet writer produced a corrupted file due to page value count overflow

2023-08-31 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2342: Assignee: Zamil Majdy > Parquet writer produced a corrupted file due to page value count

[jira] [Updated] (PARQUET-2288) Bump exec-maven-plugin from 1.2.1 to 3.1.0

2023-08-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2288: - Fix Version/s: format-2.10.0 (was: 1.14.0) > Bump exec-maven-plugin from 1.2.1

[jira] [Updated] (PARQUET-2288) Bump exec-maven-plugin from 1.2.1 to 3.1.0

2023-08-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2288: - Affects Version/s: (was: 1.13.0) > Bump exec-maven-plugin from 1.2.1 to 3.1.0 >

[jira] [Resolved] (PARQUET-2288) Bump exec-maven-plugin from 1.2.1 to 3.1.0

2023-08-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2288. -- Resolution: Fixed > Bump exec-maven-plugin from 1.2.1 to 3.1.0 >

[jira] [Commented] (PARQUET-2340) appendRowGroup will loose pageIndex

2023-08-22 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757769#comment-17757769 ] Gang Wu commented on PARQUET-2340: -- Do you have any special handling that ParquetRewriter cannot do?

[jira] [Commented] (PARQUET-2340) appendRowGroup will loose pageIndex

2023-08-22 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757506#comment-17757506 ] Gang Wu commented on PARQUET-2340: -- [~NathanKan] You may be interested in the method

[jira] [Commented] (PARQUET-2339) ArrayIndexOutOfBounds exception writing parquet from Avro in Apache Hudi

2023-08-21 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757176#comment-17757176 ] Gang Wu commented on PARQUET-2339: -- The config above uses three level list instead of the legacy two

[jira] [Resolved] (PARQUET-2333) Support bzip2 and xz compressions in the to-avro subcommand

2023-08-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2333. -- Fix Version/s: 1.14.0 Resolution: Fixed > Support bzip2 and xz compressions in the to-avro

[jira] [Resolved] (PARQUET-2335) Allow the scan subcommand to take multiple files

2023-08-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2335. -- Resolution: Fixed > Allow the scan subcommand to take multiple files >

[jira] [Updated] (PARQUET-2335) Allow the scan subcommand to take multiple files

2023-08-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2335: - Fix Version/s: 1.14.0 > Allow the scan subcommand to take multiple files >

[jira] [Resolved] (PARQUET-2334) Allow the cat subcommand to take multiple files

2023-08-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2334. -- Resolution: Fixed > Allow the cat subcommand to take multiple files >

[jira] [Updated] (PARQUET-2334) Allow the cat subcommand to take multiple files

2023-08-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2334: - Fix Version/s: 1.14.0 > Allow the cat subcommand to take multiple files >

[jira] [Updated] (PARQUET-2332) Fix unexpectedly disabled tests to be executed

2023-08-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2332: - Fix Version/s: 1.14.0 > Fix unexpectedly disabled tests to be executed >

[jira] [Resolved] (PARQUET-2332) Fix unexpectedly disabled tests to be executed

2023-08-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2332. -- Resolution: Fixed > Fix unexpectedly disabled tests to be executed >

[jira] [Updated] (PARQUET-2331) Allow convert-csv to take multiple input files

2023-08-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2331: - Fix Version/s: 1.14.0 > Allow convert-csv to take multiple input files >

[jira] [Resolved] (PARQUET-2331) Allow convert-csv to take multiple input files

2023-08-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2331. -- Resolution: Fixed > Allow convert-csv to take multiple input files >

[jira] [Resolved] (PARQUET-2330) Fix convert-csv to show the correct position of the invalid record

2023-08-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2330. -- Resolution: Fixed > Fix convert-csv to show the correct position of the invalid record >

[jira] [Updated] (PARQUET-2330) Fix convert-csv to show the correct position of the invalid record

2023-08-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2330: - Fix Version/s: 1.14.0 > Fix convert-csv to show the correct position of the invalid record >

[jira] [Resolved] (PARQUET-2328) Add overwrite option to the parquet-cli's rewrite subcommand

2023-08-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2328. -- Resolution: Fixed > Add overwrite option to the parquet-cli's rewrite subcommand >

[jira] [Resolved] (PARQUET-2329) Fix wrong help messages of parquet-cli subcommands

2023-08-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2329. -- Resolution: Fixed > Fix wrong help messages of parquet-cli subcommands >

[jira] [Updated] (PARQUET-2329) Fix wrong help messages of parquet-cli subcommands

2023-08-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2329: - Fix Version/s: 1.14.0 > Fix wrong help messages of parquet-cli subcommands >

[jira] [Updated] (PARQUET-2328) Add overwrite option to the parquet-cli's rewrite subcommand

2023-07-31 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2328: - Fix Version/s: 1.14.0 > Add overwrite option to the parquet-cli's rewrite subcommand >

[jira] [Assigned] (PARQUET-2323) Use bit vector to store Prebuffered column chunk index

2023-07-28 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2323: Assignee: Jinpeng Zhou > Use bit vector to store Prebuffered column chunk index >

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-07-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747579#comment-17747579 ] Gang Wu commented on PARQUET-: -- Opened https://github.com/apache/arrow/issues/36882 > [Format]

[jira] [Resolved] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-07-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-. -- Resolution: Fixed > [Format] RLE encoding spec incorrect for v2 data pages >

[jira] [Comment Edited] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-07-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747231#comment-17747231 ] Gang Wu edited comment on PARQUET- at 7/26/23 4:12 AM: --- Actually in the

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-07-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747231#comment-17747231 ] Gang Wu commented on PARQUET-: -- Make sense. Then we probably have to fix the description of the

[jira] [Comment Edited] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-07-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746766#comment-17746766 ] Gang Wu edited comment on PARQUET- at 7/26/23 2:00 AM: --- As all the

[jira] [Comment Edited] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-07-24 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746755#comment-17746755 ] Gang Wu edited comment on PARQUET- at 7/25/23 4:34 AM: --- I just revisited

  1   2   3   4   >