Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-06-08 Thread via GitHub


Weijun-H commented on PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2953946930

   Late to the party, but wow—this PR is awesome! Thanks @mkarbo 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-06-05 Thread via GitHub


scovich commented on PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2945487430

   See e.g. 
https://github.com/apache/arrow-rs/pull/7403/files#diff-fce219df196b3dffd6c2f6bb4924a34f299817a317943a19482289ecb8f6af69R1150-R1194


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-06-05 Thread via GitHub


scovich commented on PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2945462274

   > > @alamb About the builder APIs — one question I’ve been thinking about: 
in the Go version, the single-buffer builder is impressively efficient and 
avoids most overhead. In contrast, the nested builder I’ve been working on in 
Rust offers more ergonomic APIs, but it’s harder to optimize due to 
intermediate buffers and lifetime management. Do you have any advice on how I 
can start?
   > 
   > Impressively effiicient and simpler without overhead sounds pretty great 
to me
   > 
   > One suggestion is to basically sketch out what the two APIs might look 
like (I know you have one already) so we can try and evaluate the code
   > 
   > Another suggestion is to make a PR with a basic builder / framework to 
start (e.g. for primitives). If you can break it up into smaller PRs I think it 
might be easier to get you specific feedback on potential alternative approaches
   
   Intuitively, it seems like would want the "nested" builders to all be 
working with "views" of a single buffer? Ultimately, a builder would either be 
appending to the end of the buffer, or filling in bytes that were previously 
allocated. And from the playing around I did previously, I _think_ all the 
filling in is a local operation (e.g. an array needs to pre-allocate space for 
the N offsets, but can't populate the offset values until the value bytes have 
actually been written).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-06-05 Thread via GitHub


alamb commented on PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2944888496

   > @alamb About the builder APIs — one question I’ve been thinking about: in 
the Go version, the single-buffer builder is impressively efficient and avoids 
most overhead. In contrast, the nested builder I’ve been working on in Rust 
offers more ergonomic APIs, but it’s harder to optimize due to intermediate 
buffers and lifetime management. Do you have any advice on how I can start?
   
   Impressively effiicient and simpler without overhead sounds pretty great to 
me
   
   One suggestion is to basically sketch out what the two APIs might look like 
(I know you have one already) so we can try and evaluate the code
   
   Another suggestion is to make a PR with a basic builder / framework to start 
(e.g. for primitives). If you can break it up into smaller PRs I think it might 
be easier to get you specific feedback on potential alternative approaches 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-06-04 Thread via GitHub


mkarbo commented on PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2942956866

   Yes @alamb I will continue when I have time on my hand


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-06-04 Thread via GitHub


PinkCrow007 commented on PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2942484216

   So glad to see the initial API merged! I'd be happy to contribute.
   @alamb About the builder APIs — one question I’ve been thinking about: in 
the Go version, the single-buffer builder is impressively efficient and avoids 
most overhead. In contrast, the nested builder I’ve been working on in Rust 
offers more ergonomic APIs, but it’s harder to optimize due to intermediate 
buffers and lifetime management. Do you have any advice on how I can start?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-06-04 Thread via GitHub


alamb commented on PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2939735840

   I made a PR to fix some clippy errors and run tests as part of CI:
   - https://github.com/apache/arrow-rs/pull/7601
   
   @mkarbo  and @scovich  any chance you have time to keep working on the 
unresolved comments about objects / lists?
   
   Also, should we make a PR now to bring the builder APIs that @PinkCrow007 is 
working on into this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-05-31 Thread via GitHub


alamb merged PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-05-31 Thread via GitHub


alamb commented on PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#issuecomment-2925701606

   gogogogogo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Initial API for reading Variant data and metadata [arrow-rs]

2025-05-31 Thread via GitHub


alamb commented on code in PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#discussion_r2118229302


##
parquet-variant/src/decoder.rs:
##
@@ -0,0 +1,158 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+use arrow_schema::ArrowError;
+use std::array::TryFromSliceError;
+
+use crate::utils::{array_from_slice, first_byte_from_slice, string_from_slice};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {

Review Comment:
   yes, let's move them in a follow on PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]