Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
Weijun-H commented on PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2953946930 Late to the party, but wow—this PR is awesome! Thanks @mkarbo 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
scovich commented on PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2945487430 See e.g. https://github.com/apache/arrow-rs/pull/7403/files#diff-fce219df196b3dffd6c2f6bb4924a34f299817a317943a19482289ecb8f6af69R1150-R1194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
scovich commented on PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2945462274 > > @alamb About the builder APIs — one question I’ve been thinking about: in the Go version, the single-buffer builder is impressively efficient and avoids most overhead. In contrast, the nested builder I’ve been working on in Rust offers more ergonomic APIs, but it’s harder to optimize due to intermediate buffers and lifetime management. Do you have any advice on how I can start? > > Impressively effiicient and simpler without overhead sounds pretty great to me > > One suggestion is to basically sketch out what the two APIs might look like (I know you have one already) so we can try and evaluate the code > > Another suggestion is to make a PR with a basic builder / framework to start (e.g. for primitives). If you can break it up into smaller PRs I think it might be easier to get you specific feedback on potential alternative approaches Intuitively, it seems like would want the "nested" builders to all be working with "views" of a single buffer? Ultimately, a builder would either be appending to the end of the buffer, or filling in bytes that were previously allocated. And from the playing around I did previously, I _think_ all the filling in is a local operation (e.g. an array needs to pre-allocate space for the N offsets, but can't populate the offset values until the value bytes have actually been written). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
alamb commented on PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2944888496 > @alamb About the builder APIs — one question I’ve been thinking about: in the Go version, the single-buffer builder is impressively efficient and avoids most overhead. In contrast, the nested builder I’ve been working on in Rust offers more ergonomic APIs, but it’s harder to optimize due to intermediate buffers and lifetime management. Do you have any advice on how I can start? Impressively effiicient and simpler without overhead sounds pretty great to me One suggestion is to basically sketch out what the two APIs might look like (I know you have one already) so we can try and evaluate the code Another suggestion is to make a PR with a basic builder / framework to start (e.g. for primitives). If you can break it up into smaller PRs I think it might be easier to get you specific feedback on potential alternative approaches -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
mkarbo commented on PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2942956866 Yes @alamb I will continue when I have time on my hand -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
PinkCrow007 commented on PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2942484216 So glad to see the initial API merged! I'd be happy to contribute. @alamb About the builder APIs — one question I’ve been thinking about: in the Go version, the single-buffer builder is impressively efficient and avoids most overhead. In contrast, the nested builder I’ve been working on in Rust offers more ergonomic APIs, but it’s harder to optimize due to intermediate buffers and lifetime management. Do you have any advice on how I can start? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
alamb commented on PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2939735840 I made a PR to fix some clippy errors and run tests as part of CI: - https://github.com/apache/arrow-rs/pull/7601 @mkarbo and @scovich any chance you have time to keep working on the unresolved comments about objects / lists? Also, should we make a PR now to bring the builder APIs that @PinkCrow007 is working on into this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
alamb merged PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
alamb commented on PR #7535: URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2925701606 gogogogogo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]
alamb commented on code in PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#discussion_r2118229302
##
parquet-variant/src/decoder.rs:
##
@@ -0,0 +1,158 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+use arrow_schema::ArrowError;
+use std::array::TryFromSliceError;
+
+use crate::utils::{array_from_slice, first_byte_from_slice, string_from_slice};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
Review Comment:
yes, let's move them in a follow on PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
