[PR] Implement Improved arrow-avro Reader Zero-Byte Record Handling [arrow-rs]

2025-07-18 Thread via GitHub
jecsand838 opened a new pull request, #7966: URL: https://github.com/apache/arrow-rs/pull/7966 … Avro files # Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/4886 - Follow up to https://github.com/apache/arrow-rs/pull/7834 # Rati

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
zhuqi-lucas commented on code in PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#discussion_r2217158575 ## arrow-ord/src/sort.rs: ## @@ -4709,4 +4731,77 @@ mod tests { assert_eq!(&sorted[0], &expected_struct_array); } + +/// A simple, correct but

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
zhuqi-lucas commented on code in PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#discussion_r2217158467 ## arrow-buffer/src/util/bit_iterator.rs: ## @@ -323,4 +380,110 @@ mod tests { let mask = &[223, 23]; BitIterator::new(mask, 17, 0); } + +

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
zhuqi-lucas commented on PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#issuecomment-3091882200 Latest result for new implement, has a little regression, but still promising result: ```rust critcmp --filter "nulls to indices" fast_path_for_bit_map_scan main group

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
zhuqi-lucas commented on PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#issuecomment-3091846858 Thank you @alamb @Dandandan @jhorstmann for review. I addressed comments in latest PR, and also added rich tests. Thanks! -- This is an automated message from the Apache Git

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
zhuqi-lucas commented on code in PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#discussion_r2217145627 ## arrow-ord/src/sort.rs: ## @@ -178,44 +178,136 @@ where } } -// partition indices into valid and null indices -fn partition_validity(array: &dyn Array) -

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
zhuqi-lucas commented on code in PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#discussion_r2217145748 ## arrow-ord/src/sort.rs: ## @@ -178,44 +178,136 @@ where } } -// partition indices into valid and null indices -fn partition_validity(array: &dyn Array) -

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
zhuqi-lucas commented on code in PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#discussion_r2217145627 ## arrow-ord/src/sort.rs: ## @@ -178,44 +178,136 @@ where } } -// partition indices into valid and null indices -fn partition_validity(array: &dyn Array) -

Re: [PR] docs: add docs for driver manifests [arrow-adbc]

2025-07-18 Thread via GitHub
kou commented on code in PR #3176: URL: https://github.com/apache/arrow-adbc/pull/3176#discussion_r2217102143 ## docs/source/format/driver_manifests.rst: ## @@ -0,0 +1,300 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreement

Re: [PR] GH-47085 [C++][Parquet] increase default compression level for zstd [arrow]

2025-07-18 Thread via GitHub
blacha commented on PR #47086: URL: https://github.com/apache/arrow/pull/47086#issuecomment-3091392804 Thanks for all the discussion, I was unaware this would affect IPC as well. Up until a few weeks ago GDAL did not have a `compresssion_level` parameter so all parquet files with comp

Re: [PR] feat(go/adbc/driver/bigquery): support service account impersonation [arrow-adbc]

2025-07-18 Thread via GitHub
yu-iskw commented on PR #3174: URL: https://github.com/apache/arrow-adbc/pull/3174#issuecomment-3091358486 We can also implement kind of acceptance tests to call BigQuery API only if environment variables to use BigQuery are set. Indeed, I tested the changed code with the approach on the lo

Re: [PR] feat(go/adbc/driver/bigquery): support service account impersonation [arrow-adbc]

2025-07-18 Thread via GitHub
yu-iskw commented on PR #3174: URL: https://github.com/apache/arrow-adbc/pull/3174#issuecomment-3091341131 @zeroshade Thank you for the feedback. I have updated the code at https://github.com/apache/arrow-adbc/pull/3174/commits/47683ec893483afbeac6a123ffd11ec3090dd3f7 . -- This is an aut

[PR] feat(csharp/test/Drivers/Databricks): Support token refresh to extend connection lifetime [arrow-adbc]

2025-07-18 Thread via GitHub
alexguo-db opened a new pull request, #3177: URL: https://github.com/apache/arrow-adbc/pull/3177 ## Motivation In scenarios like PowerBI dataset refresh, if a query runs longer than the OAuth token's expiration time (typically 1 hour for AAD tokens), the connection fails. PowerBI onl

Re: [PR] [Variant] Avoid extra allocation in object builder [arrow-rs]

2025-07-18 Thread via GitHub
scovich commented on code in PR #7935: URL: https://github.com/apache/arrow-rs/pull/7935#discussion_r2216465127 ## parquet-variant/src/builder.rs: ## @@ -598,6 +599,49 @@ impl ParentState<'_> { } } } + +// returns the beginning offset of buffer for

Re: [PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
zeroshade commented on code in PR #7965: URL: https://github.com/apache/arrow-rs/pull/7965#discussion_r2217006088 ## parquet-variant-compute/src/variant_get.rs: ## @@ -177,4 +192,209 @@ mod test { r#"{"inner_field": 1234}"#, ); } + +/// Shredding:

Re: [I] [Variant] API to construct Shredded Variant Arrays [arrow-rs]

2025-07-18 Thread via GitHub
zeroshade commented on issue #7895: URL: https://github.com/apache/arrow-rs/issues/7895#issuecomment-3091018615 I'm in favor of @scovich's suggestion, and that is what I did for the Go implementation along with my plan for defining the Canonical extension type. The schema ```

[PR] docs: add docs for driver manifests [arrow-adbc]

2025-07-18 Thread via GitHub
zeroshade opened a new pull request, #3176: URL: https://github.com/apache/arrow-adbc/pull/3176 With the driver manager implementations for C/C++, Go, Rust and Python updated to utilize and leverage driver manifests, we should properly document how manifests work and what the format is.

Re: [PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
cashmand commented on code in PR #7965: URL: https://github.com/apache/arrow-rs/pull/7965#discussion_r2216981871 ## parquet-variant-compute/src/variant_get.rs: ## @@ -177,4 +192,209 @@ mod test { r#"{"inner_field": 1234}"#, ); } + +/// Shredding: e

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
veronica-m-ef commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216923022 ## arrow-avro/src/reader/mod.rs: ## @@ -221,12 +221,11 @@ impl ReaderBuilder { } fn make_record_decoder(&self, schema: &AvroSchema<'_>) -> Result {

Re: [I] [Variant] API to construct Shredded Variant Arrays [arrow-rs]

2025-07-18 Thread via GitHub
scovich commented on issue #7895: URL: https://github.com/apache/arrow-rs/issues/7895#issuecomment-3090705369 > We would need this schema: > > ``` > STRUCT { > metadata: BinaryView, > value: BinaryView, > typed_value: STRUCT { > foo: Int64, > bar: Int32

Re: [I] [Release] 21.0.0 post release tasks [arrow]

2025-07-18 Thread via GitHub
kou commented on issue #47127: URL: https://github.com/apache/arrow/issues/47127#issuecomment-3090705259 @fvalenduc Could you open a new issue for it? You may need to wait for the R package release. #46950 is the related issue for it. -- This is an automated message from the Apache Git Se

Re: [I] [Release] 21.0.0 post release tasks [arrow]

2025-07-18 Thread via GitHub
kou commented on issue #47127: URL: https://github.com/apache/arrow/issues/47127#issuecomment-3090701946 @lscheilling Could you open a new issue for it? FYI: #46959 is the related issue for it. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
scovich commented on code in PR #7965: URL: https://github.com/apache/arrow-rs/pull/7965#discussion_r2216900571 ## parquet-variant-compute/src/variant_get.rs: ## @@ -177,4 +192,209 @@ mod test { r#"{"inner_field": 1234}"#, ); } + +/// Shredding: ex

Re: [PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
scovich commented on code in PR #7965: URL: https://github.com/apache/arrow-rs/pull/7965#discussion_r2216895846 ## parquet-variant-compute/src/variant_get.rs: ## @@ -177,4 +192,209 @@ mod test { r#"{"inner_field": 1234}"#, ); } + +/// Shredding: ex

Re: [PR] arrow-ipc: Remove all abilities to preserve dict IDs [arrow-rs]

2025-07-18 Thread via GitHub
alamb merged PR #7940: URL: https://github.com/apache/arrow-rs/pull/7940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] arrow-ipc: Remove all abilities to preserve dict IDs [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7940: URL: https://github.com/apache/arrow-rs/pull/7940#issuecomment-3090645347 Thanks again @brancz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216815522 ## arrow-avro/src/reader/mod.rs: ## @@ -221,12 +221,11 @@ impl ReaderBuilder { } fn make_record_decoder(&self, schema: &AvroSchema<'_>) -> Result { -

Re: [I] [Variant] Add low level support for shredding and unshredding [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on issue #7715: URL: https://github.com/apache/arrow-rs/issues/7715#issuecomment-3090538741 We are discussing reading shredded variants here; - https://github.com/apache/arrow-rs/issues/7941 We are discussing writing shredded variants here: - https://github.co

Re: [I] [Variant] Support `variant_get` kernel for shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090535697 There is a bunch more back and forth on the thread as well that might be interesting -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] [Variant] Support `variant_get` kernel for shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090534867 @friendlymatthew in https://github.com/apache/arrow-rs/pull/7915#discussion_r2203418536 Hi, how do we plan on storing `typed_value`s? Do we plan on encoding it as a `Variant` a

Re: [I] [Variant] Support `variant_get` kernel for shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090534233 @alamb in https://github.com/apache/arrow-rs/pull/7915#discussion_r2203360483 > Shredded fields need a full blown variant builder, because they're strongly typed and we need to

Re: [I] [Variant] Support `variant_get` kernel for shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090533015 @scovich and I were discussing other options here https://github.com/apache/arrow-rs/pull/7915#discussion_r2202981997: --- @scovich : https://github.com/apache/arrow-rs

Re: [PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7946: URL: https://github.com/apache/arrow-rs/pull/7946#issuecomment-3090523533 > Is there any issue for implementing this? I would love to work on it I think we are discussing reading shredded variants on - https://github.com/apache/arrow-rs/issues/7941

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
veronica-m-ef commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216802532 ## arrow-avro/src/reader/record.rs: ## @@ -301,9 +301,23 @@ impl Decoder { } Codec::Uuid => Self::Uuid(Vec::with_capacity(DEFAULT_CAPAC

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
jecsand838 commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2214304249 ## arrow-avro/src/reader/record.rs: ## @@ -301,9 +301,23 @@ impl Decoder { } Codec::Uuid => Self::Uuid(Vec::with_capacity(DEFAULT_CAPACITY

Re: [PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7946: URL: https://github.com/apache/arrow-rs/pull/7946#discussion_r2216801798 ## parquet-variant-compute/src/field_operations.rs: ## @@ -0,0 +1,532 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

2025-07-18 Thread via GitHub
carpecodeum commented on PR #7946: URL: https://github.com/apache/arrow-rs/pull/7946#issuecomment-3090517778 > Thank you for this PR @carpecodeum > > This is very cool > > I think there is already a `variant_get` implementation in https://github.com/apache/arrow-rs/blob/d809f19

Re: [PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

2025-07-18 Thread via GitHub
carpecodeum commented on code in PR #7946: URL: https://github.com/apache/arrow-rs/pull/7946#discussion_r2216798559 ## parquet-variant-compute/src/field_operations.rs: ## @@ -0,0 +1,532 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

2025-07-18 Thread via GitHub
carpecodeum commented on code in PR #7946: URL: https://github.com/apache/arrow-rs/pull/7946#discussion_r2216798292 ## parquet-variant-compute/src/variant_array.rs: ## @@ -154,6 +155,172 @@ impl VariantArray { fn find_value_field(array: &StructArray) -> Option { ar

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
jecsand838 commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216795716 ## arrow-avro/src/reader/record.rs: ## @@ -301,9 +301,23 @@ impl Decoder { } Codec::Uuid => Self::Uuid(Vec::with_capacity(DEFAULT_CAPACITY

Re: [PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7946: URL: https://github.com/apache/arrow-rs/pull/7946#discussion_r2216789293 ## parquet-variant-compute/src/field_operations.rs: ## @@ -0,0 +1,532 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7946: URL: https://github.com/apache/arrow-rs/pull/7946#discussion_r2216788578 ## parquet-variant-compute/src/variant_array.rs: ## @@ -154,6 +155,172 @@ impl VariantArray { fn find_value_field(array: &StructArray) -> Option { array.co

Re: [PR] fix(csharp/test/Drivers/Databricks): Change the default QueryTimeoutSeconds to 3 hours [arrow-adbc]

2025-07-18 Thread via GitHub
CurtHagenlocher merged PR #3175: URL: https://github.com/apache/arrow-adbc/pull/3175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
jecsand838 commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216783991 ## arrow-avro/src/codec.rs: ## @@ -161,6 +161,66 @@ impl<'a> TryFrom<&Schema<'a>> for AvroField { } } +/// Builder for an [`AvroField`] +#[derive(Debug)] +p

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
jecsand838 commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216779091 ## arrow-avro/src/codec.rs: ## @@ -161,6 +161,66 @@ impl<'a> TryFrom<&Schema<'a>> for AvroField { } } +/// Builder for an [`AvroField`] +#[derive(Debug)] +p

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
jecsand838 commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216766988 ## arrow-avro/src/reader/record.rs: ## @@ -431,12 +422,18 @@ impl Decoder { let nanos = (millis as i64) * 1_000_000; builder.appen

Re: [PR] [Draft] implements Sum,sum_checked,min,max,is Distict,inverse for REE. [arrow-rs]

2025-07-18 Thread via GitHub
Rich-T-kid commented on code in PR #7933: URL: https://github.com/apache/arrow-rs/pull/7933#discussion_r2216752880 ## arrow-ord/src/cmp.rs: ## @@ -232,6 +239,7 @@ fn compare_op(op: Op, lhs: &dyn Datum, rhs: &dyn Datum) -> Result

Re: [PR] [Draft] implements Sum,sum_checked,min,max,is Distict,inverse for REE. [arrow-rs]

2025-07-18 Thread via GitHub
Rich-T-kid commented on code in PR #7933: URL: https://github.com/apache/arrow-rs/pull/7933#discussion_r2216753563 ## arrow-ord/src/cmp.rs: ## @@ -855,4 +863,122 @@ mod tests { neq(&col.slice(0, col.len() - 1), &col.slice(1, col.len() - 1)).unwrap(); } + +#[

Re: [PR] [Draft] implements Sum,sum_checked,min,max,is Distict,inverse for REE. [arrow-rs]

2025-07-18 Thread via GitHub
Rich-T-kid commented on code in PR #7933: URL: https://github.com/apache/arrow-rs/pull/7933#discussion_r2216752202 ## arrow-ord/src/cmp.rs: ## @@ -224,6 +223,14 @@ fn compare_op(op: Op, lhs: &dyn Datum, rhs: &dyn Datum) -> Result

Re: [PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7965: URL: https://github.com/apache/arrow-rs/pull/7965#discussion_r2216749153 ## parquet-variant-compute/src/variant_get.rs: ## @@ -177,4 +192,209 @@ mod test { r#"{"inner_field": 1234}"#, ); } + +/// Shredding: extr

Re: [PR] fix(csharp/test/Drivers/Databricks): Change the default QueryTimeoutSeconds to 3 hours [arrow-adbc]

2025-07-18 Thread via GitHub
jackyhu-db commented on PR #3175: URL: https://github.com/apache/arrow-adbc/pull/3175#issuecomment-3090431479 > This change will obviously work, but I think it's pretty confusing. Can we instead put a `virtual int DefaultQueryTimeoutSeconds { get; } on `Hive2Connection` and then override it

Re: [PR] [Draft] implements Sum,sum_checked,min,max,is Distict,inverse for REE. [arrow-rs]

2025-07-18 Thread via GitHub
Rich-T-kid commented on code in PR #7933: URL: https://github.com/apache/arrow-rs/pull/7933#discussion_r2216748841 ## arrow-arith/src/aggregate.rs: ## @@ -17,7 +17,7 @@ //! Defines aggregations over Arrow arrays. -use arrow_array::cast::*; +use arrow_array::cast::{*}; Revi

Re: [PR] feat(csharp/src/Drivers/Databricks): Use ArrowSchema for Response Schema [arrow-adbc]

2025-07-18 Thread via GitHub
toddmeng-db commented on code in PR #3140: URL: https://github.com/apache/arrow-adbc/pull/3140#discussion_r2213774571 ## csharp/src/Drivers/Databricks/DatabricksStatement.cs: ## @@ -64,10 +64,53 @@ public DatabricksStatement(DatabricksConnection connection) enablePK

Re: [PR] feat(csharp/src/Drivers/Databricks): Use ArrowSchema for Response Schema [arrow-adbc]

2025-07-18 Thread via GitHub
toddmeng-db commented on code in PR #3140: URL: https://github.com/apache/arrow-adbc/pull/3140#discussion_r2214598029 ## csharp/test/Drivers/Databricks/E2E/StatementTests.cs: ## @@ -1279,5 +1279,80 @@ public async Task OlderDBRVersion_ShouldSetSchemaViaUseStatement()

Re: [PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7965: URL: https://github.com/apache/arrow-rs/pull/7965#discussion_r2216573737 ## parquet-variant-compute/src/variant_get.rs: ## @@ -177,4 +192,209 @@ mod test { r#"{"inner_field": 1234}"#, ); } + +/// Shredding: extr

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
jhorstmann commented on code in PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#discussion_r2216741483 ## arrow-ord/src/sort.rs: ## @@ -178,44 +178,136 @@ where } } -// partition indices into valid and null indices -fn partition_validity(array: &dyn Array) ->

Re: [I] [Packaging][CentOS] Drop support for 7 [arrow]

2025-07-18 Thread via GitHub
amoeba commented on issue #40735: URL: https://github.com/apache/arrow/issues/40735#issuecomment-3090390884 That seems fine and fair @pitrou. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
jecsand838 commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216696821 ## arrow-avro/src/reader/record.rs: ## @@ -344,7 +332,10 @@ impl Decoder { Self::Decimal256(_, _, _, builder) => builder.append_value(i256::ZERO),

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
Dandandan commented on code in PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#discussion_r2216677908 ## arrow-ord/src/sort.rs: ## @@ -178,44 +178,136 @@ where } } -// partition indices into valid and null indices -fn partition_validity(array: &dyn Array) ->

Re: [PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
Samyak2 commented on code in PR #7965: URL: https://github.com/apache/arrow-rs/pull/7965#discussion_r2216671411 ## parquet-variant-compute/src/variant_get.rs: ## @@ -177,4 +192,209 @@ mod test { r#"{"inner_field": 1234}"#, ); } + +/// Shredding: ex

Re: [I] Add Flatten helper for dictionary arrays in Go Arrow compute library [arrow-go]

2025-07-18 Thread via GitHub
Mandukhai-Alimaa commented on issue #436: URL: https://github.com/apache/arrow-go/issues/436#issuecomment-3090308881 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

2025-07-18 Thread via GitHub
carpecodeum commented on code in PR #7946: URL: https://github.com/apache/arrow-rs/pull/7946#discussion_r2216626412 ## parquet-variant-compute/src/field_operations.rs: ## @@ -0,0 +1,532 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat(csharp/src): Add support for adding and configuring OTel exporters [arrow-adbc]

2025-07-18 Thread via GitHub
birschick-bq commented on code in PR #2949: URL: https://github.com/apache/arrow-adbc/pull/2949#discussion_r2216612783 ## csharp/src/Telemetry/Traces/Exporters/ExportersBuilder.cs: ## @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] feat(csharp/src): Add support for adding and configuring OTel exporters [arrow-adbc]

2025-07-18 Thread via GitHub
birschick-bq commented on code in PR #2949: URL: https://github.com/apache/arrow-adbc/pull/2949#discussion_r2216611639 ## csharp/src/Telemetry/Traces/Exporters/FileExporter/TracingFile.cs: ## @@ -0,0 +1,219 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Add arrow-avro support for Impala Nullability [arrow-rs]

2025-07-18 Thread via GitHub
veronica-m-ef commented on code in PR #7954: URL: https://github.com/apache/arrow-rs/pull/7954#discussion_r2216601845 ## arrow-avro/src/reader/record.rs: ## @@ -301,9 +301,23 @@ impl Decoder { } Codec::Uuid => Self::Uuid(Vec::with_capacity(DEFAULT_CAPAC

Re: [PR] [Variant] Revisit VariantMetadata and Object equality [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7961: URL: https://github.com/apache/arrow-rs/pull/7961#issuecomment-3090226051 So I really think it is important to be able to compare the logical value the Variant encodes for the purpose of tests. You can see almost all tests do this, and as we move into shredding

Re: [PR] [Variant] Revisit VariantMetadata and Object equality [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7961: URL: https://github.com/apache/arrow-rs/pull/7961#issuecomment-3090220267 > I agree that whatever we do should not be merely physical byte comparisons... but what does logical equality even mean? As in, if two variant objects compare logically equal, what can I

Re: [PR] Perf: Support partition_validity to use fast path for bit map scan [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7962: URL: https://github.com/apache/arrow-rs/pull/7962#discussion_r2216588076 ## arrow-ord/src/sort.rs: ## @@ -178,44 +178,136 @@ where } } -// partition indices into valid and null indices -fn partition_validity(array: &dyn Array) -> (Vec

Re: [PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7965: URL: https://github.com/apache/arrow-rs/pull/7965#issuecomment-3090174790 FYI @Samyak2 @scovich @friendlymatthew and @klion26 and @carpecodeum as I think you are interested in this feature -- This is an automated message from the Apache Git Service. To respond

Re: [I] [Variant] Support `variant_get` kernel for shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090173185 Here is a practical suggestion on how to make progress on variant shredding -- we start working out how it would work for a simple example I have written some tests here that ma

Re: [PR] feat(csharp/src): Add support for adding and configuring OTel exporters [arrow-adbc]

2025-07-18 Thread via GitHub
jduo commented on code in PR #2949: URL: https://github.com/apache/arrow-adbc/pull/2949#discussion_r2216571309 ## csharp/src/Telemetry/Traces/Exporters/FileExporter/TracingFile.cs: ## @@ -0,0 +1,219 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[PR] [Variant] WIP Tests for variant_get of shredded variants [arrow-rs]

2025-07-18 Thread via GitHub
alamb opened a new pull request, #7965: URL: https://github.com/apache/arrow-rs/pull/7965 # Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/6736 - Part of https://github.com/apache/arrow-rs/issues/7941 # Rationale for this change In

[PR] fix(csharp/test/Drivers/Databricks): Change the default QueryTimeoutSeconds to 3 hours [arrow-adbc]

2025-07-18 Thread via GitHub
jackyhu-db opened a new pull request, #3175: URL: https://github.com/apache/arrow-adbc/pull/3175 ## Motivation Currently, the default `QueryTimeoutSeconds` is **60s** (set by `Hive2Server2Connection` [here](https://github.com/apache/arrow-adbc/blob/main/csharp/src/Drivers/Apache/Hive

Re: [PR] feat(csharp/src): Add support for adding and configuring OTel exporters [arrow-adbc]

2025-07-18 Thread via GitHub
jduo commented on code in PR #2949: URL: https://github.com/apache/arrow-adbc/pull/2949#discussion_r2216549385 ## csharp/src/Telemetry/Traces/Exporters/ExportersBuilder.cs: ## @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contr

Re: [I] [Release] 21.0.0 post release tasks [arrow]

2025-07-18 Thread via GitHub
fvalenduc commented on issue #47127: URL: https://github.com/apache/arrow/issues/47127#issuecomment-3090049020 I tried to install the arrow package with devtools using the apache-arrow-21.0.0 tag and it failed like this: ** testing if installed package can be loaded from temporary locatio

Re: [PR] [Variant] Impl `PartialEq` for VariantObject [arrow-rs]

2025-07-18 Thread via GitHub
scovich commented on code in PR #7943: URL: https://github.com/apache/arrow-rs/pull/7943#discussion_r2216453780 ## parquet-variant/src/variant/object.rs: ## @@ -387,6 +389,38 @@ impl<'m, 'v> VariantObject<'m, 'v> { } } +// Custom implementation of PartialEq for variant o

Re: [PR] [Variant] Impl `PartialEq` for VariantObject [arrow-rs]

2025-07-18 Thread via GitHub
scovich commented on code in PR #7943: URL: https://github.com/apache/arrow-rs/pull/7943#discussion_r2216451926 ## parquet-variant/src/variant/object.rs: ## @@ -387,6 +389,31 @@ impl<'m, 'v> VariantObject<'m, 'v> { } } +impl<'m, 'v> PartialEq for VariantObject<'m, 'v> {

Re: [PR] [Variant] Impl `PartialEq` for VariantObject [arrow-rs]

2025-07-18 Thread via GitHub
scovich commented on code in PR #7943: URL: https://github.com/apache/arrow-rs/pull/7943#discussion_r2216449041 ## parquet-variant/src/variant/object.rs: ## @@ -387,6 +389,31 @@ impl<'m, 'v> VariantObject<'m, 'v> { } } +impl<'m, 'v> PartialEq for VariantObject<'m, 'v> {

Re: [PR] feat(go/adbc): add IngestStream helper for one-call ingestion and add TestIngestStream [arrow-adbc]

2025-07-18 Thread via GitHub
zeroshade merged PR #3150: URL: https://github.com/apache/arrow-adbc/pull/3150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.

Re: [PR] [Variant] Revisit VariantMetadata and Object equality [arrow-rs]

2025-07-18 Thread via GitHub
scovich commented on PR #7961: URL: https://github.com/apache/arrow-rs/pull/7961#issuecomment-3089972179 > While reviewing this I was thinking maybe we should revisit equality > > I think what we are doing is trying to make `Variant::eq` to compare if the Variants are _logically_ equa

Re: [PR] feat(go/adbc/driver/bigquery): support service account impersonation [arrow-adbc]

2025-07-18 Thread via GitHub
zeroshade commented on code in PR #3174: URL: https://github.com/apache/arrow-adbc/pull/3174#discussion_r2216383547 ## go/adbc/driver/bigquery/driver.go: ## @@ -77,6 +77,49 @@ const ( AccessTokenEndpoint = "https://accounts.google.com/o/oauth2/token"; AccessT

Re: [PR] [Variant] Avoid extra allocation in object builder [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7935: URL: https://github.com/apache/arrow-rs/pull/7935#discussion_r2216382026 ## parquet-variant/src/builder.rs: ## @@ -1317,7 +1414,15 @@ impl<'a> ObjectBuilder<'a> { /// This is to ensure that the object is always finalized before its parent b

Re: [PR] Convert JSON to VariantArray without copying (8 - 32% faster) [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#issuecomment-3089948663 This one is now ready for review. I am quite pleased it already shows some benchmarks going 30% faster - https://github.com/apache/arrow-rs/pull/7911#issuecomment-3089911559 Along

Re: [PR] [Variant] Avoid extra allocation in object builder [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7935: URL: https://github.com/apache/arrow-rs/pull/7935#issuecomment-3089923144 🤖: Benchmark completed Details ``` group 7899-avoid-extra-allocation-in-object-buildermain ---

Re: [PR] [Variant] Avoid extra allocation in object builder [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7935: URL: https://github.com/apache/arrow-rs/pull/7935#issuecomment-3089911780 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#issuecomment-3089911559 🤖: Benchmark completed Details ``` group alamb_append_variant_builder main -

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#issuecomment-3089900939 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2216348377 ## parquet-variant-compute/src/variant_array_builder.rs: ## @@ -55,9 +55,14 @@ use std::sync::Arc; /// }; /// builder.append_variant_buffers(&metadata, &value); /// +

Re: [PR] Add missing `parquet-variant-compute` crate to CI jobs [arrow-rs]

2025-07-18 Thread via GitHub
alamb merged PR #7963: URL: https://github.com/apache/arrow-rs/pull/7963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Add missing parquet-variant-compute crate to CI jobs [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7963: URL: https://github.com/apache/arrow-rs/pull/7963#issuecomment-3089894974 In order to keep the CI clean, I am going to merge this without review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#issuecomment-3089880742 🤖: Benchmark completed Details ``` group alamb_append_variant_builder main -

Re: [PR] [Variant] VariantMetadata is allowed to contain the empty string [arrow-rs]

2025-07-18 Thread via GitHub
codephage2020 commented on code in PR #7956: URL: https://github.com/apache/arrow-rs/pull/7956#discussion_r2216344593 ## parquet-variant/src/variant/metadata.rs: ## @@ -240,28 +240,23 @@ impl<'m> VariantMetadata<'m> { let value_buffer = string_from_

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#issuecomment-3089864104 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2216344784 ## parquet-variant-compute/src/from_json.rs: ## @@ -41,10 +40,10 @@ pub fn batch_json_string_to_variant(input: &ArrayRef) -> Result

Re: [I] Retry does not cover connection errors [arrow-rs-object-store]

2025-07-18 Thread via GitHub
criccomini commented on issue #368: URL: https://github.com/apache/arrow-rs-object-store/issues/368#issuecomment-3089855509 Just came here to say we're hitting: ``` thread 'tokio-runtime-worker' panicked at /root/.cargo/git/checkouts/slatedb-a6e73982df30678a/2fe991a/slatedb/src/co

[I] [Variant] Convert JSON to Variant with fewer copies [arrow-rs]

2025-07-18 Thread via GitHub
alamb opened a new issue, #7964: URL: https://github.com/apache/arrow-rs/issues/7964 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** In a quest to have the fastest and most efficient Variant implementation I would like to avoi

Re: [PR] [Variant] remove VariantMetadata::dictionary_size [arrow-rs]

2025-07-18 Thread via GitHub
alamb merged PR #7958: URL: https://github.com/apache/arrow-rs/pull/7958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [I] [Variant] remove VariantMetadata::dictionary_size [arrow-rs]

2025-07-18 Thread via GitHub
alamb closed issue #7947: [Variant] remove VariantMetadata::dictionary_size URL: https://github.com/apache/arrow-rs/issues/7947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [Test] Add tests for VariantList equality [arrow-rs]

2025-07-18 Thread via GitHub
alamb commented on PR #7953: URL: https://github.com/apache/arrow-rs/pull/7953#issuecomment-3089829118 Thanks again @friendlymatthew -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [Test] Add tests for VariantList equality [arrow-rs]

2025-07-18 Thread via GitHub
alamb merged PR #7953: URL: https://github.com/apache/arrow-rs/pull/7953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

  1   2   >