Re: [PR] GH-45794: [C++] Add array directory to Meson configuration [arrow]

2025-04-30 Thread via GitHub
kou commented on code in PR #45795: URL: https://github.com/apache/arrow/pull/45795#discussion_r2069886886 ## cpp/src/arrow/array/meson.build: ## @@ -0,0 +1,56 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NO

Re: [PR] GH-45833: [C++] Add JSON directory to Meson configuration [arrow]

2025-04-30 Thread via GitHub
kou merged PR #45834: URL: https://github.com/apache/arrow/pull/45834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [I] [C++] Add json directory to Meson [arrow]

2025-04-30 Thread via GitHub
kou commented on issue #45833: URL: https://github.com/apache/arrow/issues/45833#issuecomment-2844168598 Issue resolved by pull request 45834 https://github.com/apache/arrow/pull/45834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [WIP] implement multi range query in single request [arrow-rs-object-store]

2025-04-30 Thread via GitHub
Xuanwo commented on PR #345: URL: https://github.com/apache/arrow-rs-object-store/pull/345#issuecomment-2844151355 > You're saying that Azure _does_ support multiple byte ranges? This [SO answer](https://stackoverflow.com/a/57882772) says it doesn't, and this [relevant blob storage doc](h

Re: [PR] Update arrow_reader_row_filter benchmark to reflect ClickBench distribution [arrow-rs]

2025-04-30 Thread via GitHub
zhuqi-lucas commented on PR #7461: URL: https://github.com/apache/arrow-rs/pull/7461#issuecomment-2844151896 > Unfortunately, even after adjusting the benchmark on this branch I still don't see major changes in #7428. > > I will look more deeply tomorrow > > ```shell > cargo

Re: [PR] GH-25025: [C++] Move non core compute kernels into separate shared library [arrow]

2025-04-30 Thread via GitHub
kou commented on code in PR #46261: URL: https://github.com/apache/arrow/pull/46261#discussion_r2069794617 ## c_glib/arrow-glib/compute.cpp: ## @@ -37,6 +37,8 @@ #include #include +auto registration_status_ = arrow::compute::RegisterComputeKernels(); Review Comment: Ho

Re: [PR] GH-25025: [C++] Move non core compute kernels into separate shared library [arrow]

2025-04-30 Thread via GitHub
kou commented on PR #46261: URL: https://github.com/apache/arrow/pull/46261#issuecomment-2843975038 Acero and Dataset static libraries weren't built with `-DARROW_COMPUTE_STATIC`: https://github.com/apache/arrow/actions/runs/14751558013/job/41410004318?pr=46261#step:7:14525 ``

Re: [PR] feat(c): Use C++ visibility support in Meson configuration [arrow-adbc]

2025-04-30 Thread via GitHub
kou commented on code in PR #2740: URL: https://github.com/apache/arrow-adbc/pull/2740#discussion_r2069775161 ## c/driver/framework/CMakeLists.txt: ## @@ -35,6 +36,7 @@ if(ADBC_BUILD_TESTS) base_driver_test.cc EXTRA_LINK_LIBS ad

Re: [PR] Update arrow_reader_row_filter benchmark to reflect ClickBench distribution [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7461: URL: https://github.com/apache/arrow-rs/pull/7461#issuecomment-2843904300 Unfortunately, even after adjusting the benchmark on this branch I still don't see major changes in https://github.com/apache/arrow-rs/pull/7428. I will look more deeply tomorrow

[PR] Update arrow_reader_row_filter benchmark to reflect ClickBench distribution [arrow-rs]

2025-04-30 Thread via GitHub
alamb opened a new pull request, #7461: URL: https://github.com/apache/arrow-rs/pull/7461 # Which issue does this PR close? - Closes https://github.com/apache/arrow-rs/issues/7460 # Rationale for this change We would like a benchmark that accurately reflects

Re: [PR] docs: rework "What exactly is ADBC?" in FAQ [arrow-adbc]

2025-04-30 Thread via GitHub
lidavidm merged PR #2763: URL: https://github.com/apache/arrow-adbc/pull/2763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] feat: deterministic metadata encoding [arrow-rs]

2025-04-30 Thread via GitHub
alamb merged PR #7437: URL: https://github.com/apache/arrow-rs/pull/7437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] feat: deterministic metadata encoding [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7437: URL: https://github.com/apache/arrow-rs/pull/7437#issuecomment-2843828640 Thanks again @timsaucer and @etseidl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Deterministic metadata encoding [arrow-rs]

2025-04-30 Thread via GitHub
alamb closed issue #7448: Deterministic metadata encoding URL: https://github.com/apache/arrow-rs/issues/7448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] [C++][Statistics] Implement a builder for ArrayStatistics [arrow]

2025-04-30 Thread via GitHub
kou commented on issue #46226: URL: https://github.com/apache/arrow/issues/46226#issuecomment-2843792389 Could you use a separated PR instead of mixing this and #45639? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] docs: rework "What exactly is ADBC?" in FAQ [arrow-adbc]

2025-04-30 Thread via GitHub
amoeba commented on PR #2763: URL: https://github.com/apache/arrow-adbc/pull/2763#issuecomment-2843780132 Thanks for taking a look. I re-worked the language a bit: - Reworded "ADBC is a standard" to "ADBC is _the_ standard" since I like that better - From bullet point 1, removed sp

Re: [I] [C++] Compilation Error in Apache Arrow Flight SQL 19 with C++20: Incomplete Type FlightEndpoint [arrow]

2025-04-30 Thread via GitHub
lidavidm commented on issue #45608: URL: https://github.com/apache/arrow/issues/45608#issuecomment-2843680626 Issue resolved by pull request 46264 https://github.com/apache/arrow/pull/46264 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] docs: rework "What exactly is ADBC?" in FAQ [arrow-adbc]

2025-04-30 Thread via GitHub
lidavidm commented on PR #2763: URL: https://github.com/apache/arrow-adbc/pull/2763#issuecomment-2843648795 Maybe just omit listing languages altogether? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] docs: rework "What exactly is ADBC?" in FAQ [arrow-adbc]

2025-04-30 Thread via GitHub
lidavidm commented on PR #2763: URL: https://github.com/apache/arrow-adbc/pull/2763#issuecomment-2843647841 I guess we never really nailed down what is the "canonical" API. The C header is "most canonical", and I've been treating Java/Go like that as well, but also Java/Go make it much easi

Re: [PR] GH-35166: [C++] Increase precision of decimals in aggregate functions [arrow]

2025-04-30 Thread via GitHub
zanmato1984 commented on PR #44184: URL: https://github.com/apache/arrow/pull/44184#issuecomment-2843508113 > @khwilson I'm still not sure this is actually desirable (@zanmato1984 what do you think?). Most DBMSes do precision promotion for `sum` aggregation, and most promotions are a

Re: [I] arrow_reader_row_filter benchmark doesn't capture page cache improvements [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on issue #7460: URL: https://github.com/apache/arrow-rs/issues/7460#issuecomment-2843506718 Here is the `test.parquet` file being created by the benchmark: [test.zip](https://github.com/user-attachments/files/19985142/test.zip) The equivalent numbers are: * Sele

Re: [PR] feat(go/adbc): prototype OpenTelemetry trace file exporter in go driver [arrow-adbc]

2025-04-30 Thread via GitHub
birschick-bq commented on PR #2729: URL: https://github.com/apache/arrow-adbc/pull/2729#issuecomment-2843498084 @davidhcoe / @CurtHagenlocher Prototype/POC for OTel tracing in the `go` drivers. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] feat(go/adbc): prototype OpenTelemetry trace file exporter in go driver [arrow-adbc]

2025-04-30 Thread via GitHub
birschick-bq commented on code in PR #2729: URL: https://github.com/apache/arrow-adbc/pull/2729#discussion_r206927 ## go/adbc/driver/snowflake/connection.go: ## @@ -169,7 +171,16 @@ func isWildcardStr(ident string) bool { return strings.ContainsAny(ident, "_%") }

Re: [PR] GH-46205: [C++][Parquet][WIP] Read/Write null count statistics for UNKNOWN sort order [arrow]

2025-04-30 Thread via GitHub
paleolimbot commented on code in PR #46275: URL: https://github.com/apache/arrow/pull/46275#discussion_r2069381360 ## cpp/src/parquet/statistics.cc: ## @@ -963,6 +963,89 @@ std::shared_ptr DoMakeComparator(Type::type physical_type, return nullptr; } +template +class Unso

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843098466 🤔 second benchmark runs look very good performance wise. I'll run one more -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843148544 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP

Re: [PR] Add Map support to arrow-avro [arrow-rs]

2025-04-30 Thread via GitHub
jecsand838 commented on PR #7451: URL: https://github.com/apache/arrow-rs/pull/7451#issuecomment-2843204435 @klion26 I pushed changes just now that: * Fixed the linter errors * Made `read_blockwise_items` more readable * Changed `let val_field = Field::new("value", val_dt, true)`

Re: [I] [Parquet][C++] Logical types with sort order UNKNOWN are missing null_count statistics [arrow]

2025-04-30 Thread via GitHub
paleolimbot commented on issue #46205: URL: https://github.com/apache/arrow/issues/46205#issuecomment-2843187570 No rush on my end (and no offense taken if either of you would rather take this on!), but I started #46275 to wrap my head around the issue. Happy to take pretty much any angle a

Re: [PR] docs: rework "What exactly is ADBC?" in FAQ [arrow-adbc]

2025-04-30 Thread via GitHub
amoeba commented on PR #2763: URL: https://github.com/apache/arrow-adbc/pull/2763#issuecomment-2843193775 I'll note that, depending on the answer to my second point, I can file follow-up PRs to pages like https://arrow.apache.org/adbc/main/format/specification.html to update those. -- Th

Re: [PR] GH-46205: [C++][Parquet][WIP] Read/Write null count statistics for UNKNOWN sort order [arrow]

2025-04-30 Thread via GitHub
paleolimbot commented on code in PR #46275: URL: https://github.com/apache/arrow/pull/46275#discussion_r2069380251 ## cpp/src/parquet/statistics.cc: ## @@ -963,6 +963,89 @@ std::shared_ptr DoMakeComparator(Type::type physical_type, return nullptr; } +template +class Unso

Re: [PR] GH-46205: [C++][Parquet][WIP] Read/Write null count statistics for UNKNOWN sort order [arrow]

2025-04-30 Thread via GitHub
paleolimbot commented on code in PR #46275: URL: https://github.com/apache/arrow/pull/46275#discussion_r2069378772 ## cpp/src/parquet/metadata.cc: ## @@ -307,8 +307,10 @@ class ColumnChunkMetaData::ColumnChunkMetaDataImpl { DCHECK(writer_version_ != nullptr); // If the

Re: [PR] GH-46205: [C++][Parquet][WIP] Read/Write null count statistics for UNKNOWN sort order [arrow]

2025-04-30 Thread via GitHub
paleolimbot commented on code in PR #46275: URL: https://github.com/apache/arrow/pull/46275#discussion_r2069376548 ## cpp/src/parquet/column_writer.cc: ## @@ -1250,8 +1250,11 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, page_statistics_ = MakeStatistics(des

Re: [PR] GH-46205: [C++][Parquet][WIP] Read/Write null count statistics for UNKNOWN sort order [arrow]

2025-04-30 Thread via GitHub
github-actions[bot] commented on PR #46275: URL: https://github.com/apache/arrow/pull/46275#issuecomment-2843159685 :warning: GitHub issue #46205 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-46205: [C++][Parquet][WIP] Read/Write null count statistics for UNKNOWN sort order [arrow]

2025-04-30 Thread via GitHub
paleolimbot opened a new pull request, #46275: URL: https://github.com/apache/arrow/pull/46275 ### Rationale for this change Minimum and maximum values are not useful in the context of an unsorted converted or loigcal type; however, null counts are! Geometry is one example of an unso

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843155256 🤖: Benchmark completed Details ``` group main speedup_arith - ---

[PR] docs: rework "What exactly is ADBC?" in FAQ [arrow-adbc]

2025-04-30 Thread via GitHub
amoeba opened a new pull request, #2763: URL: https://github.com/apache/arrow-adbc/pull/2763 This PR is a bit of a check on my understanding. There are a few changes here: 1. Changed first sentence to help draw the reader in a bit better. 2. Changes the language "a set of abstract

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843148229 🤖: Benchmark completed Details ``` groupmain speedup_arith -

Re: [I] arrow_reader_row_filter benchmark doesn't capture page cache improvements [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on issue #7460: URL: https://github.com/apache/arrow-rs/issues/7460#issuecomment-2843142141 I did some analysis on `hits.parquet`: * Selectivity is: `13172392` / `7497` = `0.132` * Average run length of each `RowSelection`: `7497` / `14054784` = `7.114`

Re: [PR] Add Map support to arrow-avro [arrow-rs]

2025-04-30 Thread via GitHub
jecsand838 commented on code in PR #7451: URL: https://github.com/apache/arrow-rs/pull/7451#discussion_r2069363906 ## arrow-avro/src/reader/record.rs: ## @@ -267,10 +305,83 @@ impl Decoder { .collect::, _>>()?; Arc::new(StructArray::new(fiel

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843099239 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
Dandandan commented on code in PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#discussion_r2069335130 ## arrow-arith/src/arity.rs: ## @@ -251,14 +249,16 @@ where /// /// Return an error if the arrays have different lengths or /// the operation is under erroneous -

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843092423 🤖: Benchmark completed Details ``` groupmain speedup_arith -

Re: [PR] [WIP] implement multi range query in single request [arrow-rs-object-store]

2025-04-30 Thread via GitHub
kylebarron commented on PR #345: URL: https://github.com/apache/arrow-rs-object-store/pull/345#issuecomment-2843077560 You're saying that Azure _does_ support multiple byte ranges? This [SO answer](https://stackoverflow.com/a/57882772) says it doesn't, and this [relevant blob storage doc]

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843044961 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP

Re: [I] Consider removing `skip` from `RowSelector` [arrow-rs]

2025-04-30 Thread via GitHub
XiangpengHao commented on issue #7450: URL: https://github.com/apache/arrow-rs/issues/7450#issuecomment-2842994528 > We can represent a `RowSelector` as array of alternating select / skip / select rows. > > e.g. : [0, 10, 5, 10, 5] => select 0, skip 10, select 5, skip 10, select 5

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843044794 🤖: Benchmark completed Details ``` group main speedup_arith - ---

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843034737 > Other than that I think this is still improving the kernels on the use of safety / showing the from_trusted_len_iter is not needed as much. I agree this PR is an improvement in gen

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843035533 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP

Re: [PR] Add Map support to arrow-avro [arrow-rs]

2025-04-30 Thread via GitHub
jecsand838 commented on code in PR #7451: URL: https://github.com/apache/arrow-rs/pull/7451#discussion_r2069301291 ## arrow-avro/src/reader/record.rs: ## @@ -290,3 +401,84 @@ fn flush_primitive( } const DEFAULT_CAPACITY: usize = 1024; + + +#[cfg(test)] +mod tests { +use

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843025170 🤖: Benchmark completed Details ``` group main speedup_arith - ---

Re: [I] Consider removing `skip` from `RowSelector` [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on issue #7450: URL: https://github.com/apache/arrow-rs/issues/7450#issuecomment-2843022760 Another potential option might be an enum like this: ```rust enum RowSelector { Skip(usize), Scan(usize), Bitmap(BooleanBuffer), } `` -- This i

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
Dandandan commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843019583 It looks like performance difference is bigger for the arithmetic changes on my computer :thinking: I think it might be something related to the osx allocator. -- This is an automat

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843018259 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on code in PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#discussion_r2069284531 ## arrow-array/src/array/primitive_array.rs: ## @@ -1035,12 +1028,9 @@ impl PrimitiveArray { F: FnMut(U::Item) -> T::Native, { let nulls = left.lo

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
Dandandan commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2843013144 Could you also ru the boolean kernel bech -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Add Map support to arrow-avro [arrow-rs]

2025-04-30 Thread via GitHub
jecsand838 commented on code in PR #7451: URL: https://github.com/apache/arrow-rs/pull/7451#discussion_r2069291552 ## arrow-avro/src/codec.rs: ## @@ -155,6 +168,22 @@ impl Codec { DataType::List(Arc::new(f.field_with_name(Field::LIST_FIELD_DEFAULT_NAME)))

Re: [PR] Add Map support to arrow-avro [arrow-rs]

2025-04-30 Thread via GitHub
jecsand838 commented on code in PR #7451: URL: https://github.com/apache/arrow-rs/pull/7451#discussion_r2069285439 ## arrow-avro/src/codec.rs: ## @@ -155,6 +168,22 @@ impl Codec { DataType::List(Arc::new(f.field_with_name(Field::LIST_FIELD_DEFAULT_NAME)))

Re: [PR] Add Map support to arrow-avro [arrow-rs]

2025-04-30 Thread via GitHub
jecsand838 commented on code in PR #7451: URL: https://github.com/apache/arrow-rs/pull/7451#discussion_r2069285439 ## arrow-avro/src/codec.rs: ## @@ -155,6 +168,22 @@ impl Codec { DataType::List(Arc::new(f.field_with_name(Field::LIST_FIELD_DEFAULT_NAME)))

Re: [I] [EPIC] Faster performance for parquet predicate evaluation for non selective filters [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on issue #7456: URL: https://github.com/apache/arrow-rs/issues/7456#issuecomment-2842992749 I just spoke with @XiangpengHao -- from my perspective the current status is: 1. https://github.com/apache/arrow-rs/issues/7363: blocked on getting some benchmark results that show

Re: [PR] [WIP] implement multi range query in single request [arrow-rs-object-store]

2025-04-30 Thread via GitHub
Xuanwo commented on PR #345: URL: https://github.com/apache/arrow-rs-object-store/pull/345#issuecomment-2842916261 > If there are any providers that do support this, I'd expect for this to be an implementation detail of get_ranges instead of a separate method. I know of azure support

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2842876482 🤖: Benchmark completed Details ``` groupmain speedup_arith -

Re: [I] [C++][Docs] Improve documentation of our security model [arrow]

2025-04-30 Thread via GitHub
zanmato1984 commented on issue #46218: URL: https://github.com/apache/arrow/issues/46218#issuecomment-2842880834 Yes, we should probably do so. Based on the plenty crashes I fixed, I don't feel capable of nor have time to investigate the exploits. -- This is an automated message from the

Re: [PR] [WIP] implement multi range query in single request [arrow-rs-object-store]

2025-04-30 Thread via GitHub
kylebarron commented on PR #345: URL: https://github.com/apache/arrow-rs-object-store/pull/345#issuecomment-2842895278 If there are any providers that _do_ support this, I'd expect for this to be an implementation detail of `get_ranges` instead of a separate method. -- This is an automat

Re: [PR] GH-36411: [C++][Python] Use meson-python for PyArrow build system [arrow]

2025-04-30 Thread via GitHub
eli-schwartz commented on code in PR #45854: URL: https://github.com/apache/arrow/pull/45854#discussion_r2069210359 ## python/MANIFEST.in: ## @@ -1,15 +0,0 @@ -include README.md -include ../LICENSE.txt Review Comment: I patched meson (not meson-python) so that building binar

Re: [PR] GH-36411: [C++][Python] Use meson-python for PyArrow build system [arrow]

2025-04-30 Thread via GitHub
eli-schwartz commented on code in PR #45854: URL: https://github.com/apache/arrow/pull/45854#discussion_r2069210359 ## python/MANIFEST.in: ## @@ -1,15 +0,0 @@ -include README.md -include ../LICENSE.txt Review Comment: I patched meson (not meson-python) so that building binar

Re: [PR] GH-46209: [Documentation][C++][Compute] Internal documentation for row table [arrow]

2025-04-30 Thread via GitHub
zanmato1984 commented on PR #46210: URL: https://github.com/apache/arrow/pull/46210#issuecomment-2842865083 Also cc @raulcd @pitrou . Does the structure of rst look good? (I'll remove the markdown once the rst is OK.) -- This is an automated message from the Apache Git Service. To respond

Re: [PR] GH-46209: [Documentation][C++][Compute] Internal documentation for row table [arrow]

2025-04-30 Thread via GitHub
zanmato1984 commented on code in PR #46210: URL: https://github.com/apache/arrow/pull/46210#discussion_r2069197559 ## cpp/src/arrow/compute/row/doc/row_table.md: ## @@ -0,0 +1,84 @@ + + +# Row Table + +## Overview + +The row table in Arrow represents data stored in row-major for

Re: [PR] Speedup arithmetic kernels (up to -25%) / not (-30%) [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7457: URL: https://github.com/apache/arrow-rs/pull/7457#issuecomment-2842824383 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP

Re: [I] [C++] Enable `-Wmissing-declarations` in CHECKIN mode [arrow]

2025-04-30 Thread via GitHub
zanmato1984 commented on issue #46272: URL: https://github.com/apache/arrow/issues/46272#issuecomment-2842688526 +1 on having this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] [C++][Parquet] Misleading `GeoStatistics::dimension_empty` docstring [arrow]

2025-04-30 Thread via GitHub
paleolimbot commented on issue #46270: URL: https://github.com/apache/arrow/issues/46270#issuecomment-2842597331 > Can you? I believe so? It's true if and only if it can guarantee emptiness for that dimension? We definitely test the that this is the case and I'm struggling to find th

Re: [PR] Make `FooterTail` public [arrow-rs]

2025-04-30 Thread via GitHub
alamb commented on PR #7440: URL: https://github.com/apache/arrow-rs/pull/7440#issuecomment-2842557391 Thanks again @masonh22 and @etseidl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Make `FooterTail` public [arrow-rs]

2025-04-30 Thread via GitHub
alamb merged PR #7440: URL: https://github.com/apache/arrow-rs/pull/7440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [I] Move parquet::file::metadata::reader::FooterTail to parquet::file::metadata so that it is public [arrow-rs]

2025-04-30 Thread via GitHub
alamb closed issue #7438: Move parquet::file::metadata::reader::FooterTail to parquet::file::metadata so that it is public URL: https://github.com/apache/arrow-rs/issues/7438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] GH-46268: [C++] Improve ArrayData docstrings [arrow]

2025-04-30 Thread via GitHub
mapleFU commented on code in PR #46271: URL: https://github.com/apache/arrow/pull/46271#discussion_r2068993074 ## cpp/src/arrow/array/data.h: ## @@ -64,32 +64,24 @@ constexpr int64_t kUnknownNullCount = -1; /// /// This data structure is a self-contained representation of the

Re: [PR] GH-46268: [C++] Improve ArrayData docstrings [arrow]

2025-04-30 Thread via GitHub
raulcd commented on code in PR #46271: URL: https://github.com/apache/arrow/pull/46271#discussion_r2068989692 ## cpp/src/arrow/array/data.h: ## @@ -64,32 +64,24 @@ constexpr int64_t kUnknownNullCount = -1; /// /// This data structure is a self-contained representation of the m

Re: [PR] GH-46268: [C++] Improve ArrayData docstrings [arrow]

2025-04-30 Thread via GitHub
felipecrv commented on code in PR #46271: URL: https://github.com/apache/arrow/pull/46271#discussion_r2068972615 ## cpp/src/arrow/array/data.h: ## @@ -64,32 +64,24 @@ constexpr int64_t kUnknownNullCount = -1; /// /// This data structure is a self-contained representation of th

Re: [I] [Docs][Release][Website] Figure out why the version banner code changed in 19.0.0 [arrow]

2025-04-30 Thread via GitHub
assignUser commented on issue #45290: URL: https://github.com/apache/arrow/issues/45290#issuecomment-2842400049 (This is of course not an actual blocker but should ideally be fixed prior to next release to remove the manual intervention) -- This is an automated message from the Apache Git

Re: [I] [Docs][Release][Website] Figure out why the version banner code changed in 19.0.0 [arrow]

2025-04-30 Thread via GitHub
assignUser commented on issue #45290: URL: https://github.com/apache/arrow/issues/45290#issuecomment-2842396959 I used the following incantation to fix the issue manually but it relies on gnu sed's `-z` slurp mode option so it's not portable: ```bash find docs -type f -not -path 'docs/

Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-30 Thread via GitHub
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2068919703 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the

Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-30 Thread via GitHub
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2068909794 ## python/pyarrow/scalar.pxi: ## @@ -221,6 +221,8 @@ cdef class BooleanScalar(Scalar): cdef CBooleanScalar* sp = self.wrapped.get() return sp.value if

Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-30 Thread via GitHub
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2068900808 ## python/pyarrow/scalar.pxi: ## @@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar): buffer = self.as_buffer() return None if buffer is None else buf

Re: [PR] GH-46155: [C++] Implement Tensorflow directory in Meson [arrow]

2025-04-30 Thread via GitHub
github-actions[bot] commented on PR #46156: URL: https://github.com/apache/arrow/pull/46156#issuecomment-2842318783 Revision: e50a28c85a49345bf0f49aa47c67348f2ab8774e Submitted crossbow builds: [ursacomputing/crossbow @ actions-bafd403200](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-45794: [C++] Add array directory to Meson configuration [arrow]

2025-04-30 Thread via GitHub
WillAyd commented on PR #45795: URL: https://github.com/apache/arrow/pull/45795#issuecomment-2842317189 @kou if you have time to review this that would be much appreciated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] [C++][Parquet] Misleading `GeoStatistics::dimension_empty` docstring [arrow]

2025-04-30 Thread via GitHub
wgtmac commented on issue #46270: URL: https://github.com/apache/arrow/issues/46270#issuecomment-2842308464 > /// For statistics read from a Parquet file, dimension_empty() will always contain > /// false values because there is no mechanism to communicate an empty interval > /// i

Re: [PR] GH-46155: [C++] Implement Tensorflow directory in Meson [arrow]

2025-04-30 Thread via GitHub
WillAyd commented on PR #46156: URL: https://github.com/apache/arrow/pull/46156#issuecomment-2842309118 @github-actions crossbow submit *meson -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] GH-45833: [C++] Add JSON directory to Meson configuration [arrow]

2025-04-30 Thread via GitHub
WillAyd commented on PR #45834: URL: https://github.com/apache/arrow/pull/45834#issuecomment-2842311779 @kou I think this one is pretty simple - any objections to merging? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-30 Thread via GitHub
pitrou commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2842310641 > That would involve registering `pa.MapScalar` and `pa.ListScalar` as virtual subclasses with `Mapping.register(...)` and `Sequence.register(...)`, respectively, and possibly implementing

Re: [I] [C++][Compute] Incorrect document for MemAllocation::PREALLOCATE for string types [arrow]

2025-04-30 Thread via GitHub
pitrou commented on issue #46177: URL: https://github.com/apache/arrow/issues/46177#issuecomment-2842274797 > By the way, what should we do about View Types and Union Types? I think this can be a separate issue and PR, unless you really feel comfortable tackling all types in one go.

Re: [I] [C++][Parquet] Misleading `GeoStatistics::dimension_empty` docstring [arrow]

2025-04-30 Thread via GitHub
pitrou commented on issue #46270: URL: https://github.com/apache/arrow/issues/46270#issuecomment-2842271819 > I think this is more user-friendly as is...if you have a query rectangle, you can do `if (dimension_empty(0) || dimension_empty(1)) skipThisRowGroup()` without checking validity.

Re: [PR] chore(arrow): remove most lock copies [arrow-go]

2025-04-30 Thread via GitHub
zeroshade merged PR #362: URL: https://github.com/apache/arrow-go/pull/362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

Re: [PR] GH-45908: [C++][Docs] Expose basic {Array,...}FromJSON helpers as public APIs [arrow]

2025-04-30 Thread via GitHub
pitrou commented on PR #46180: URL: https://github.com/apache/arrow/pull/46180#issuecomment-2842263203 Yes, that sounds good to me too! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] GH-46193: [Flight][Format] Extend Flight Location URI Semantics [arrow]

2025-04-30 Thread via GitHub
zeroshade commented on PR #46194: URL: https://github.com/apache/arrow/pull/46194#issuecomment-2842236985 Ok, with the approvals I've gotten here I'll draft up and send an email to the mailing list for a vote. Thanks everyone! -- This is an automated message from the Apache Git Service. T

Re: [I] [Release][Docs] ChangeLog for arrow 20 is empty [arrow]

2025-04-30 Thread via GitHub
assignUser commented on issue #46263: URL: https://github.com/apache/arrow/issues/46263#issuecomment-2842264307 Fixed! https://arrow.apache.org/release/20.0.0.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] GH-46157: [C++] Move test utility RunEndEncodeTableColumns that uses REE to test_util_internal on acero instead of common gtest_util [arrow]

2025-04-30 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #46161: URL: https://github.com/apache/arrow/pull/46161#issuecomment-2842257638 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit baf97fd12ac0254b3aef8e8329a3373050bcb8f1. There were no

Re: [I] [C++][Packaging] Remove pin for grpc-cpp in conda_env_cpp.txt [arrow]

2025-04-30 Thread via GitHub
yyossy5 commented on issue #46137: URL: https://github.com/apache/arrow/issues/46137#issuecomment-2842209515 Thank you very much, I understand well. Insert debug prints into FindZSTD.cmake and investigate further. -- This is an automated message from the Apache Git Service. To r

Re: [PR] Fix out of bounds crash in RleValueDecoder [arrow-rs]

2025-04-30 Thread via GitHub
crepererum merged PR #7441: URL: https://github.com/apache/arrow-rs/pull/7441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] Improve comments for avro [arrow-rs]

2025-04-30 Thread via GitHub
crepererum merged PR #7449: URL: https://github.com/apache/arrow-rs/pull/7449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] GH-43660: [C++] Add a `CastingGenerator` to Parquet Reader that applies required casts before slicing [arrow]

2025-04-30 Thread via GitHub
srilman commented on PR #43661: URL: https://github.com/apache/arrow/pull/43661#issuecomment-2842188178 @scott-routledge2 @IsaacWarren -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat(c): Use C++ visibility support in Meson configuration [arrow-adbc]

2025-04-30 Thread via GitHub
WillAyd commented on code in PR #2740: URL: https://github.com/apache/arrow-adbc/pull/2740#discussion_r2068770636 ## c/driver/framework/CMakeLists.txt: ## @@ -35,6 +36,7 @@ if(ADBC_BUILD_TESTS) base_driver_test.cc EXTRA_LINK_LIBS

Re: [PR] feat(c): Use C++ visibility support in Meson configuration [arrow-adbc]

2025-04-30 Thread via GitHub
WillAyd commented on code in PR #2740: URL: https://github.com/apache/arrow-adbc/pull/2740#discussion_r2068762886 ## c/driver/framework/CMakeLists.txt: ## @@ -35,6 +36,7 @@ if(ADBC_BUILD_TESTS) base_driver_test.cc EXTRA_LINK_LIBS

  1   2   >