Re: [PR] Update prost and tonic dependencies [arrow-rs]

2024-08-31 Thread via GitHub
Xuanwo commented on PR #6337: URL: https://github.com/apache/arrow-rs/pull/6337#issuecomment-2323184864 > DataFusion needs to use prost version matching arrow-flight, or there can be difficult to understand errors. > This limited ability to update prost there [apache/datafusion#12237](ht

Re: [PR] Parquet: Verify 32-bit CRC checksum when decoding pages [arrow-rs]

2024-08-31 Thread via GitHub
xmakro commented on PR #6290: URL: https://github.com/apache/arrow-rs/pull/6290#issuecomment-2323162957 Thanks for the pointers. I added the tests and documented the feature flag. Please take a look -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] GH-43809: [Docs] Update extension type examples to not use UUID [arrow]

2024-08-31 Thread via GitHub
khwilson commented on PR #43849: URL: https://github.com/apache/arrow/pull/43849#issuecomment-2323129552 Apologies for my delay, @ianmcook . Ended up just taking a nap on that flight. :-) I've taken a stab at disambiguating "parameterization." To do so, I decided to explicitly call t

Re: [PR] GH-43746: [C++] Add support for Boost 1.86 [arrow]

2024-08-31 Thread via GitHub
kou commented on code in PR #43766: URL: https://github.com/apache/arrow/pull/43766#discussion_r1739921453 ## cpp/src/arrow/testing/process.cc: ## @@ -0,0 +1,300 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See th

Re: [PR] GH-43797: [C++] Attach `arrow::ArrayStatistics` to `arrow::ArrayData` [arrow]

2024-08-31 Thread via GitHub
kou commented on code in PR #43801: URL: https://github.com/apache/arrow/pull/43801#discussion_r1739920758 ## cpp/src/arrow/array/data.cc: ## @@ -195,6 +197,7 @@ std::shared_ptr ArrayData::Slice(int64_t off, int64_t len) const { } else { copy->null_count = null_count !=

Re: [PR] Fix `MutableBuffer::into_buffer` leaking its extra capacity into the final buffer [arrow-rs]

2024-08-31 Thread via GitHub
teh-cmc commented on PR #6300: URL: https://github.com/apache/arrow-rs/pull/6300#issuecomment-2323010953 Uh-oh, I missed one! > Adding another API (if it doesn't already exist) to "shrink_to_fit" for Arrays in general That could be an alternative solution if this one doesn't cu

Re: [PR] GH-43894: [R] format_aggregation() should print options too [arrow]

2024-08-31 Thread via GitHub
github-actions[bot] commented on PR #43896: URL: https://github.com/apache/arrow/pull/43896#issuecomment-2323009842 :warning: GitHub issue #43894 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-43894: [R] format_aggregation() should print options too [arrow]

2024-08-31 Thread via GitHub
nealrichardson opened a new pull request, #43896: URL: https://github.com/apache/arrow/pull/43896 ### Rationale for this change If you printed the inner query after summarize, it would show what function was being called but not the function options. ### What changes are includ

Re: [PR] Fix `MutableBuffer::into_buffer` leaking its extra capacity into the final buffer [arrow-rs]

2024-08-31 Thread via GitHub
teh-cmc commented on PR #6300: URL: https://github.com/apache/arrow-rs/pull/6300#issuecomment-2323002688 > I am concerned that this change will decrease performance by forcing an extra copy in all situations, though I may not understand the implications I wouldn't expect this to have

Re: [I] Release arrow-rs / parquet major version `53.0.0` (September 2024) [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6016: URL: https://github.com/apache/arrow-rs/issues/6016#issuecomment-2323000760 Release thread: https://lists.apache.org/thread/91fckqxwpvcov26ty4s2wn11czkcxxbf -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Prepare arrow/parquet `53.0.0` release [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6338: URL: https://github.com/apache/arrow-rs/pull/6338#issuecomment-2322998654 > In the future is it possible to get any heads up when the release cut will actually happen? Is there an issue I should be following? The README only lists "September" and not a specific

Re: [PR] Workaround new bad file in parquet-testing [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on code in PR #6344: URL: https://github.com/apache/arrow-rs/pull/6344#discussion_r1739881037 ## parquet/tests/arrow_reader/bad_data.rs: ## @@ -27,6 +27,7 @@ static KNOWN_FILES: &[&str] = &[ "PARQUET-1481.parquet", "ARROW-GH-41317.parquet", "ARROW-

Re: [PR] Workaround new bad file in parquet-testing [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6344: URL: https://github.com/apache/arrow-rs/pull/6344#issuecomment-2322994327 Merging this in as it is a test only change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Workaround new bad file in parquet-testing [arrow-rs]

2024-08-31 Thread via GitHub
alamb merged PR #6344: URL: https://github.com/apache/arrow-rs/pull/6344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Update prost and tonic dependencies [arrow-rs]

2024-08-31 Thread via GitHub
findepi commented on PR #6337: URL: https://github.com/apache/arrow-rs/pull/6337#issuecomment-2322985485 thank you @Xuanwo @alamb for your feedback! > The benefit of using `0.12` instead of `0.12.2` is that it allows users to continue using `0.12.1` without forcing them to upgrade to

[PR] GH-43748: [R] Handle package_version in safe_r_metadata [arrow]

2024-08-31 Thread via GitHub
nealrichardson opened a new pull request, #43895: URL: https://github.com/apache/arrow/pull/43895 ### Rationale for this change See #43748. There is what appears to be a bug in R's `[[.numeric_version` implementation that leads to infinite recursion. ### What changes are inclu

Re: [PR] GH-43693: [C++][Acero] Support AVX2 swiss join decoding [arrow]

2024-08-31 Thread via GitHub
zanmato1984 commented on PR #43832: URL: https://github.com/apache/arrow/pull/43832#issuecomment-2322961505 This is on my other desktop (Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz, Coffee Lake), similar symptom (possibly because it is also Coffee Lake as my MPB). The scalar version:

Re: [PR] Manually run fmt on all files under parquet [arrow-rs]

2024-08-31 Thread via GitHub
etseidl commented on PR #6328: URL: https://github.com/apache/arrow-rs/pull/6328#issuecomment-2322942129 > The only other thing I think is important here is to provide instructions (as comments) about how to fix the CI check if it fails. E.g. "if this test fails, run `cargo fmt .` and c

Re: [I] [R] How to pass Arrow objects like Table between C++ and R? [arrow]

2024-08-31 Thread via GitHub
ajinkya-k commented on issue #43675: URL: https://github.com/apache/arrow/issues/43675#issuecomment-2322941895 So is this an inherent limitation of cpp11? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] GH-43693: [C++][Acero] Support AVX2 swiss join decoding [arrow]

2024-08-31 Thread via GitHub
zanmato1984 commented on PR #43832: URL: https://github.com/apache/arrow/pull/43832#issuecomment-2322939098 Thanks a lot for running it @mapleFU ! Much appreciated! I think we still see certain cases that scalar code beats the vectorized one (but not as much as mine). -- This is a

Re: [PR] GH-43758: [C++] Compute: More comment in RowEncoder [arrow]

2024-08-31 Thread via GitHub
zanmato1984 commented on code in PR #43763: URL: https://github.com/apache/arrow/pull/43763#discussion_r1739755908 ## cpp/src/arrow/compute/row/row_encoder_internal.h: ## @@ -142,41 +159,33 @@ struct ARROW_EXPORT VarLengthKeyEncoder : KeyEncoder { Status Encode(const ExecVa

Re: [PR] GH-43758: [C++] Compute: More comment in RowEncoder [arrow]

2024-08-31 Thread via GitHub
zanmato1984 commented on code in PR #43763: URL: https://github.com/apache/arrow/pull/43763#discussion_r1739750515 ## cpp/src/arrow/compute/row/row_encoder_internal.h: ## @@ -336,11 +354,20 @@ class ARROW_EXPORT RowEncoder { private: ExecContext* ctx_{nullptr}; std::vect

Re: [I] question: can this library be used directly by GDAL to compile the GeoParquet driver? [arrow-nanoarrow]

2024-08-31 Thread via GitHub
jorisvandenbossche commented on issue #600: URL: https://github.com/apache/arrow-nanoarrow/issues/600#issuecomment-2322933380 To add one more potential clarification: Parquet is a file format, and so the Arrow C++ project just "happens" to have functionality to read and write that file for

Re: [I] question: can this library be used directly by GDAL to compile the GeoParquet driver? [arrow-nanoarrow]

2024-08-31 Thread via GitHub
jorisvandenbossche commented on issue #600: URL: https://github.com/apache/arrow-nanoarrow/issues/600#issuecomment-2322930986 > GDAL provides a GeoParquet driver, great. I would like to build that driver through GDAL. However the GeoParquet driver in GDAL requires the Arrow C++ API ;

Re: [I] [R] `write_parquet()` has infinite recursion error when writing `packageVersion()` attributes [arrow]

2024-08-31 Thread via GitHub
nealrichardson commented on issue #43748: URL: https://github.com/apache/arrow/issues/43748#issuecomment-2322925560 Thanks for the report. This is bizarre behavior of `packageVersion`: it seems to be an infinitely recursive object. ``` > x <- packageVersion("arrow") > typeof(x)

Re: [I] [R] Failure to install from source when cmake is found but is too old [arrow]

2024-08-31 Thread via GitHub
nealrichardson commented on issue #43612: URL: https://github.com/apache/arrow/issues/43612#issuecomment-2322921533 Thank you. I've revised the title of the issue to clarify what's failing. Would you be interested in making a PR to fix this? I believe you've identified exactly the issue.

Re: [PR] Prepare arrow/parquet `53.0.0` release [arrow-rs]

2024-08-31 Thread via GitHub
kylebarron commented on PR #6338: URL: https://github.com/apache/arrow-rs/pull/6338#issuecomment-2322921019 In the future is it possible to get any heads up when the release cut will actually happen? Is there an issue I should be following? The README only lists "September" and not a specif

Re: [PR] Update prost and tonic dependencies [arrow-rs]

2024-08-31 Thread via GitHub
Xuanwo commented on PR #6337: URL: https://github.com/apache/arrow-rs/pull/6337#issuecomment-2322912981 > It is a good question about if we should always update to the most recent minor version > > Given our CI tests (implicitly) test agains the most recent versions, updating to the

Re: [I] Release arrow-rs / parquet major version `53.0.0` (September 2024) [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6016: URL: https://github.com/apache/arrow-rs/issues/6016#issuecomment-2322912205 I hit an error validating the RC1: https://github.com/apache/arrow-rs/issues/6343 I have a fix and I will make an RC2 today if no one beats me to it -- This is an automated m

Re: [I] bad_data::test_invalid_files fails during release verification [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6343: URL: https://github.com/apache/arrow-rs/issues/6343#issuecomment-2322910410 The issue appears to be that I was too clever in the tests -- they look for invalid files that are not added to the tests THey don't fail on CI as the master branch has a pinned

[I] bad_data::test_invalid_files fails during release verification [arrow-rs]

2024-08-31 Thread via GitHub
alamb opened a new issue, #6343: URL: https://github.com/apache/arrow-rs/issues/6343 **Describe the bug** While working on https://github.com/apache/arrow-rs/issues/6016 the release verification script fails like this: ``` bad_data::test_invalid_files stdout thr

[I] Release arrow-rs / parquet major version `54.0.0` (December 2024) [arrow-rs]

2024-08-31 Thread via GitHub
alamb opened a new issue, #6342: URL: https://github.com/apache/arrow-rs/issues/6342 ## Is your feature request related to a problem or challenge? Please describe what you are trying to do. https://github.com/apache/arrow-rs/issues/6341 tracks releasing `53.2.0` Our release sch

[I] Release arrow-rs / parquet minor version `53.2.0` (November 2024) [arrow-rs]

2024-08-31 Thread via GitHub
alamb opened a new issue, #6341: URL: https://github.com/apache/arrow-rs/issues/6341 ## Is your feature request related to a problem or challenge? Please describe what you are trying to do. https://github.com/apache/arrow-rs/issues/6340 tracks releasing `53.1.0` Our release sch

[I] Release arrow-rs / parquet minor version `53.1.0` (October 2024) [arrow-rs]

2024-08-31 Thread via GitHub
alamb opened a new issue, #6340: URL: https://github.com/apache/arrow-rs/issues/6340 ## Is your feature request related to a problem or challenge? Please describe what you are trying to do. https://github.com/apache/arrow-rs/issues/6016 tracks releasing `53.2.0` Our release sch

Re: [PR] Prepare arrow/parquet `53.0.0` release [arrow-rs]

2024-08-31 Thread via GitHub
alamb merged PR #6338: URL: https://github.com/apache/arrow-rs/pull/6338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

[PR] Remove vestigal conbench integration [arrow-rs]

2024-08-31 Thread via GitHub
alamb opened a new pull request, #6339: URL: https://github.com/apache/arrow-rs/pull/6339 # Which issue does this PR close? N/A # Rationale for this change https://github.com/apache/arrow-rs/pull/1289 added initial support for conbench integration but was not compl

Re: [I] Parquet writer should not write any min/max data to ColumnIndex when all values are null [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6315: URL: https://github.com/apache/arrow-rs/issues/6315#issuecomment-2322899074 `label_issue.py` automatically added labels {'parquet'} from #6316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Support writing `UTC adjusted time` arrow array to parquet [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6277: URL: https://github.com/apache/arrow-rs/issues/6277#issuecomment-2322899010 `label_issue.py` automatically added labels {'parquet'} from #6278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Invalid `ColumnIndex` written in parquet [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6310: URL: https://github.com/apache/arrow-rs/issues/6310#issuecomment-2322899059 `label_issue.py` automatically added labels {'parquet'} from #6319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Make the bearer token visible in FlightSqlServiceClient [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6253: URL: https://github.com/apache/arrow-rs/issues/6253#issuecomment-2322898951 `label_issue.py` automatically added labels {'arrow-flight'} from #6254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] parquet_derive: support reading selected columns from parquet file [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6268: URL: https://github.com/apache/arrow-rs/issues/6268#issuecomment-2322898974 `label_issue.py` automatically added labels {'parquet-derive'} from #6269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Avoid unnecessary null buffer construction when converting arrays to a different type [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6243: URL: https://github.com/apache/arrow-rs/issues/6243#issuecomment-2322898897 `label_issue.py` automatically added labels {'parquet'} from #6244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6226: URL: https://github.com/apache/arrow-rs/issues/6226#issuecomment-2322898870 `label_issue.py` automatically added labels {'arrow'} from #6225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Look into optimizing reading FixedSizeBinary arrays from parquet [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6219: URL: https://github.com/apache/arrow-rs/issues/6219#issuecomment-2322898860 `label_issue.py` automatically added labels {'arrow'} from #6244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Using a take kernel on a dense union can result in reaching "unreachable" code [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6206: URL: https://github.com/apache/arrow-rs/issues/6206#issuecomment-2322898830 `label_issue.py` automatically added labels {'arrow'} from #6209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Add benchmarks for `BYTE_STREAM_SPLIT` encoded Parquet `FIXED_LEN_BYTE_ARRAY` data [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6203: URL: https://github.com/apache/arrow-rs/issues/6203#issuecomment-2322898814 `label_issue.py` automatically added labels {'parquet'} from #6204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Derive `PartialEq` and `Eq` for `parquet::arrow::ProjectionMask` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6329: URL: https://github.com/apache/arrow-rs/issues/6329#issuecomment-2322899104 `label_issue.py` automatically added labels {'parquet'} from #6330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Allow converting empty `pyarrow.RecordBatch` to `arrow::RecordBatch` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6318: URL: https://github.com/apache/arrow-rs/issues/6318#issuecomment-2322899087 `label_issue.py` automatically added labels {'arrow'} from #6320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Parquet: Add `union` method to `RowSelection` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6307: URL: https://github.com/apache/arrow-rs/issues/6307#issuecomment-2322899039 `label_issue.py` automatically added labels {'parquet'} from #6308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] A better way to resize the buffer for the snappy encode/decode [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6276: URL: https://github.com/apache/arrow-rs/issues/6276#issuecomment-2322899003 `label_issue.py` automatically added labels {'parquet'} from #6281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Printing schema metadata includes possibly incorrect compression level [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6270: URL: https://github.com/apache/arrow-rs/issues/6270#issuecomment-2322898979 `label_issue.py` automatically added labels {'parquet'} from #6271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Tests for invalid parquet files [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6261: URL: https://github.com/apache/arrow-rs/issues/6261#issuecomment-2322898966 `label_issue.py` automatically added labels {'parquet'} from #6262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Make the bearer token visible in FlightSqlServiceClient [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6253: URL: https://github.com/apache/arrow-rs/issues/6253#issuecomment-2322898938 `label_issue.py` automatically added labels {'arrow'} from #6254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Don't panic when creating `Field` from `FFI_ArrowSchema` with no name [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6251: URL: https://github.com/apache/arrow-rs/issues/6251#issuecomment-2322898927 `label_issue.py` automatically added labels {'arrow'} from #6273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Implement `date_part` for `Duration` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6245: URL: https://github.com/apache/arrow-rs/issues/6245#issuecomment-2322898916 `label_issue.py` automatically added labels {'arrow'} from #6246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Avoid unnecessary null buffer construction when converting arrays to a different type [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6243: URL: https://github.com/apache/arrow-rs/issues/6243#issuecomment-2322898905 `label_issue.py` automatically added labels {'arrow'} from #6244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] object_store release tarballs are missing LICENSE and notice files [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6233: URL: https://github.com/apache/arrow-rs/issues/6233#issuecomment-2322898881 `label_issue.py` automatically added labels {'object-store'} from #6234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] speedup take_byte_view kernel [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6167: URL: https://github.com/apache/arrow-rs/issues/6167#issuecomment-2322898750 `label_issue.py` automatically added labels {'arrow'} from #6168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Adding sub day seconds to Date64 is ignored. [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6198: URL: https://github.com/apache/arrow-rs/issues/6198#issuecomment-2322898791 `label_issue.py` automatically added labels {'arrow'} from #6199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Add support for `StringView` and `BinaryView` statistics in `StatisticsConverter` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6164: URL: https://github.com/apache/arrow-rs/issues/6164#issuecomment-2322898729 `label_issue.py` automatically added labels {'parquet'} from #6187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Support BinaryView Types in C Schema FFI [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6170: URL: https://github.com/apache/arrow-rs/issues/6170#issuecomment-2322898766 `label_issue.py` automatically added labels {'arrow'} from #6171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Implement `filter` kernel specially for `FixedSizeByteArray` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6153: URL: https://github.com/apache/arrow-rs/issues/6153#issuecomment-2322898694 `label_issue.py` automatically added labels {'arrow'} from #6186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Use `LevelHistogram` throughout Parquet metadata [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6134: URL: https://github.com/apache/arrow-rs/issues/6134#issuecomment-2322898676 `label_issue.py` automatically added labels {'parquet'} from #6135 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Support DoPutStatementIngest from Arrow Flight SQL 17.0 [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6124: URL: https://github.com/apache/arrow-rs/issues/6124#issuecomment-2322898646 `label_issue.py` automatically added labels {'arrow'} from #6133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Support casting `BinaryView` --> `Utf8` and `LargeUtf8` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6162: URL: https://github.com/apache/arrow-rs/issues/6162#issuecomment-2322898714 `label_issue.py` automatically added labels {'arrow'} from #6180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Support DoPutStatementIngest from Arrow Flight SQL 17.0 [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6124: URL: https://github.com/apache/arrow-rs/issues/6124#issuecomment-2322898656 `label_issue.py` automatically added labels {'arrow-flight'} from #6133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] ColumnMetaData should no longer be written inline with data [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6115: URL: https://github.com/apache/arrow-rs/issues/6115#issuecomment-2322898626 `label_issue.py` automatically added labels {'parquet'} from #6117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Implement date_part for `Interval` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6113: URL: https://github.com/apache/arrow-rs/issues/6113#issuecomment-2322898613 `label_issue.py` automatically added labels {'documentation'} from #6140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Allow flushing or non-buffered writes from `arrow::ipc::writer::StreamWriter` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6099: URL: https://github.com/apache/arrow-rs/issues/6099#issuecomment-2322898593 `label_issue.py` automatically added labels {'arrow'} from #6108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Default block_size for `StringViewArray` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6094: URL: https://github.com/apache/arrow-rs/issues/6094#issuecomment-2322898580 `label_issue.py` automatically added labels {'arrow'} from #6154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Extend support for BYTE_STREAM_SPLIT to FIXED_LEN_BYTE_ARRAY, INT32, and INT64 primitive types [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6048: URL: https://github.com/apache/arrow-rs/issues/6048#issuecomment-2322898549 `label_issue.py` automatically added labels {'parquet'} from #6159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Release arrow-rs / parquet minor version `52.2.0` (August 2024) [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #5998: URL: https://github.com/apache/arrow-rs/issues/5998#issuecomment-2322898540 `label_issue.py` automatically added labels {'arrow'} from #6110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Release arrow-rs / parquet minor version `52.2.0` (August 2024) [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #5998: URL: https://github.com/apache/arrow-rs/issues/5998#issuecomment-2322898526 `label_issue.py` automatically added labels {'parquet'} from #6110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] [DISCUSSION] Proposal move `object_store` to its own github repo? [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6183: URL: https://github.com/apache/arrow-rs/issues/6183#issuecomment-2322897411 Another annoyance while creating the arrow/parquet release candidate is that I had to filter the object_store related closed issues -- This is an automated message from the Apache G

Re: [PR] feat: add `Allocator` type param to `MutableBuffer` [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on code in PR #6336: URL: https://github.com/apache/arrow-rs/pull/6336#discussion_r1739715845 ## .github/workflows/arrow.yml: ## @@ -61,8 +61,8 @@ jobs: submodules: true - name: Setup Rust toolchain uses: ./.github/actions/setup-builder

Re: [PR] parquet_derive: Match fields by name, support reading selected fields rather than all [arrow-rs]

2024-08-31 Thread via GitHub
double-free commented on PR #6269: URL: https://github.com/apache/arrow-rs/pull/6269#issuecomment-2322896202 > Thanks again @double-free No problem, I really appreciate your time, and will continue to contribute once I find reasonable improvements. -- This is an automated message f

Re: [I] Release arrow-rs / parquet major version `53.0.0` (September 2024) [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6016: URL: https://github.com/apache/arrow-rs/issues/6016#issuecomment-2322893847 I am preparing the RC now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Min row group size parquet [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6292: URL: https://github.com/apache/arrow-rs/issues/6292#issuecomment-2322893741 What is the use case of min_row_group_size ? Can it be achieved with the existing setttings of parquet-rs (max_rows and max_data_page_size) 🤔 -- This is an automated message from t

Re: [PR] Specialize filter for structs and sparse unions [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6304: URL: https://github.com/apache/arrow-rs/pull/6304#issuecomment-2322892826 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] Specialize filter for structs and sparse unions [arrow-rs]

2024-08-31 Thread via GitHub
alamb merged PR #6304: URL: https://github.com/apache/arrow-rs/pull/6304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] parquet_derive: Match fields by name, support reading selected fields rather than all [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6269: URL: https://github.com/apache/arrow-rs/pull/6269#issuecomment-2322892749 Thanks again @double-free -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] parquet_derive: support reading selected columns from parquet file [arrow-rs]

2024-08-31 Thread via GitHub
alamb closed issue #6268: parquet_derive: support reading selected columns from parquet file URL: https://github.com/apache/arrow-rs/issues/6268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] parquet_derive: Match fields by name, support reading selected fields rather than all [arrow-rs]

2024-08-31 Thread via GitHub
alamb merged PR #6269: URL: https://github.com/apache/arrow-rs/pull/6269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] GH-43746: [C++] Add support for Boost 1.86 [arrow]

2024-08-31 Thread via GitHub
pitrou commented on code in PR #43766: URL: https://github.com/apache/arrow/pull/43766#discussion_r1739713302 ## cpp/src/arrow/testing/process.cc: ## @@ -0,0 +1,300 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] Ensure IPC stream messages are contiguous [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6321: URL: https://github.com/apache/arrow-rs/pull/6321#issuecomment-2322891264 The integration CI test failure looks real to me: https://github.com/apache/arrow-rs/actions/runs/10603567022/job/29388207333?pr=6321 -- This is an automated message from the Apache Git

Re: [PR] Support zero column `RecordBatch`es in pyarrow integration (use RecordBatchOptions when converting a pyarrow RecordBatch) [arrow-rs]

2024-08-31 Thread via GitHub
alamb merged PR #6320: URL: https://github.com/apache/arrow-rs/pull/6320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [I] Allow converting empty `pyarrow.RecordBatch` to `arrow::RecordBatch` [arrow-rs]

2024-08-31 Thread via GitHub
alamb closed issue #6318: Allow converting empty `pyarrow.RecordBatch` to `arrow::RecordBatch` URL: https://github.com/apache/arrow-rs/issues/6318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Derive `PartialEq` and `Eq` for `parquet::arrow::ProjectionMask` [arrow-rs]

2024-08-31 Thread via GitHub
alamb closed issue #6329: Derive `PartialEq` and `Eq` for `parquet::arrow::ProjectionMask` URL: https://github.com/apache/arrow-rs/issues/6329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Derive PartialEq and Eq for parquet::arrow::ProjectionMask [arrow-rs]

2024-08-31 Thread via GitHub
alamb merged PR #6330: URL: https://github.com/apache/arrow-rs/pull/6330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Update prost and tonic dependencies [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6337: URL: https://github.com/apache/arrow-rs/pull/6337#issuecomment-2322889495 > Thank you for your contribution. However, I don't think it's necessary to update the patch versions of `prost` or `tonic` unless the old versions fail to build. Users who want the new ve

Re: [PR] parquet_derive: Match fields by name, support reading selected fields rather than all [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6269: URL: https://github.com/apache/arrow-rs/pull/6269#issuecomment-2322889047 I have updated the comments and merged up from main. I plan to merge this PR once the CI has completed -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] parquet_derive: support reading selected fields from parquet file [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6269: URL: https://github.com/apache/arrow-rs/pull/6269#issuecomment-2322887953 > Btw, can you point me where to update the document of `parquet_derive`? If we choose to merge the PR, this statement will be outdated: > > > Column readers are generated in the ord

Re: [I] Invalid `ColumnIndex` written in parquet [arrow-rs]

2024-08-31 Thread via GitHub
alamb closed issue #6310: Invalid `ColumnIndex` written in parquet URL: https://github.com/apache/arrow-rs/issues/6310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Fix writing of invalid Parquet ColumnIndex when row group contains null pages [arrow-rs]

2024-08-31 Thread via GitHub
alamb merged PR #6319: URL: https://github.com/apache/arrow-rs/pull/6319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Fix writing of invalid Parquet ColumnIndex when row group contains null pages [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6319: URL: https://github.com/apache/arrow-rs/pull/6319#issuecomment-2322887215 🔧 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] Minor: improve filter documentation [arrow-rs]

2024-08-31 Thread via GitHub
alamb merged PR #6317: URL: https://github.com/apache/arrow-rs/pull/6317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Minor: improve filter documentation [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6317: URL: https://github.com/apache/arrow-rs/pull/6317#issuecomment-2322886069 Thank you @crepererum and @gstvg -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Write null counts in parquet files when they are present [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6257: URL: https://github.com/apache/arrow-rs/pull/6257#issuecomment-2322885746 Thank you @etseidl -- I think 1. given the potential for this PR to cause unintended consequences 2. we haven't acutally had any bug reports related to this issue, 3. the `53.0.0`

Re: [PR] Manually run fmt on all files under parquet [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on PR #6328: URL: https://github.com/apache/arrow-rs/pull/6328#issuecomment-2322884943 Thanks @etseidl The check certainly seems to find the problem: https://github.com/apache/arrow-rs/actions/runs/10619078161/job/29435831298?pr=6328 The only other thing I thin

Re: [I] Invalid `ColumnIndex` written in parquet [arrow-rs]

2024-08-31 Thread via GitHub
alamb commented on issue #6310: URL: https://github.com/apache/arrow-rs/issues/6310#issuecomment-2322884321 > > @etseidl is this something you can look into? It seems like #6315 is tracking a somewhat different issue > > I worked offline with @samuelcolvin and @adriangb and identified

  1   2   >