Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
alamb commented on PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#issuecomment-2761670007 🎉 -- thank you so much @psvri for sticking with this -- the crate is better all around because of it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
mbrobbel merged PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
mbrobbel commented on PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#issuecomment-2758985152 @psvri if you merge/rebase `main` (to include #7336) this should be good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
alamb commented on PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#issuecomment-2722518262 > The MSRV action is failing since I haven't updated the rust version from 1.70 to 1.75 in my branch. > > Would it be possible to update MSRV in the next major version ? I think we should I think we are blocked on someone writing down a policy. - #181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
psvri commented on PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#issuecomment-2720981944 The MSRV action is failing since I haven't updated the rust version from 1.70 to 1.75 in my branch. Would it be possible to update MSRV in the next major version ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
psvri commented on PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#issuecomment-2719606613 Hello @alamb I got these numbers by running gzip benchmarks from this file https://github.com/apache/arrow-rs/blob/main/parquet/benches/compression.rs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
alamb commented on PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#issuecomment-2719077139 Hi @psvri -- thank you for this contribution I noticed that this PR reports benchmark numbers. What benchmark were these? Are they from zilb-rs? Did you run any benchmarks for the parquet crate itself (as in from the https://github.com/apache/arrow-rs/tree/main/parquet/benches directory)? ``` Benchmarking compress GZIP(GzipLevel(6)) - alphanumeric: Collecting 100 samples in estimated 5.0406 s (200 itercompress GZIP(GzipLevel(6)) - alphanumeric time: [24.395 ms 24.934 ms 25.612 ms] change: [-33.807% -31.734% -29.276%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
alamb commented on PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#issuecomment-2719071233 I merged this PR up from main to rerun the tests as I think the failing CI check was resoled -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
alamb commented on PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#issuecomment-2706255823 It seems like this would be another good reason to - #181 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
psvri commented on code in PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#discussion_r1972003732 ## parquet/Cargo.toml: ## @@ -50,7 +50,7 @@ bytes = { version = "1.1", default-features = false, features = ["std"] } thrift = { version = "0.17", default-features = false } snap = { version = "1.0", default-features = false, optional = true } brotli = { version = "7.0", default-features = false, features = ["std"], optional = true } -flate2 = { version = "1.0", default-features = false, features = ["rust_backend"], optional = true } +flate2 = { version = "1.1", default-features = false, features = ["zlib-rs"], optional = true } Review Comment: Yes. its written in pure rust. In my fork I can see wasm32 pipeline not failing https://github.com/psvri/arrow-rs/actions/runs/13547936797/job/37864105483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Improve parquet gzip compression performance using zlib-rs [arrow-rs]
kylebarron commented on code in PR #7200: URL: https://github.com/apache/arrow-rs/pull/7200#discussion_r1971995097 ## parquet/Cargo.toml: ## @@ -50,7 +50,7 @@ bytes = { version = "1.1", default-features = false, features = ["std"] } thrift = { version = "0.17", default-features = false } snap = { version = "1.0", default-features = false, optional = true } brotli = { version = "7.0", default-features = false, features = ["std"], optional = true } -flate2 = { version = "1.0", default-features = false, features = ["rust_backend"], optional = true } +flate2 = { version = "1.1", default-features = false, features = ["zlib-rs"], optional = true } Review Comment: Is this pure-rust? Does this compile for `wasm32-unknown-unknown`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org