Re: [I] [CI][C++] thread-sanitizer job fails on arrow-compute-internals-test [arrow]

2024-07-02 Thread via GitHub
zanmato1984 commented on issue #43116: URL: https://github.com/apache/arrow/issues/43116#issuecomment-2205225576 Thank you @raulcd . Let me see if I can confirm this is caused by insufficient RAM. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [C++][Compute] Fix the unnecessary extra bytes when encoding row table [arrow]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #43125: URL: https://github.com/apache/arrow/pull/43125#issuecomment-2205224273 Revision: 8619d9ecee15dcec3eef1bdaf8badf1fce0d3a01 Submitted crossbow builds: [ursacomputing/crossbow @ actions-adf44107b5](https://github.com/ursacomputing/crossbow/bra

Re: [PR] [C++][Compute] Fix the unnecessary extra bytes when encoding row table [arrow]

2024-07-02 Thread via GitHub
zanmato1984 commented on PR #43125: URL: https://github.com/apache/arrow/pull/43125#issuecomment-2205220761 @github-actions crossbow submit -g cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [C++][Compute] Fix the unnecessary extra bytes when encoding row table [arrow]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #43125: URL: https://github.com/apache/arrow/pull/43125#issuecomment-2205192852 Revision: 7b6682c21ca29c304cd0d9792a7245726a8e7ada Submitted crossbow builds: [ursacomputing/crossbow @ actions-ea31ebc24b](https://github.com/ursacomputing/crossbow/bra

Re: [PR] [C++][Compute] Fix the unnecessary extra bytes when encoding row table [arrow]

2024-07-02 Thread via GitHub
zanmato1984 commented on PR #43125: URL: https://github.com/apache/arrow/pull/43125#issuecomment-2205189585 @github-actions crossbow submit -g cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [C++][Compute] Fix the unnecessary extra bytes when encoding row table [arrow]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #43125: URL: https://github.com/apache/arrow/pull/43125#issuecomment-2205188059 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] [C++][Compute] Fix the unnecessary extra bytes when encoding row table [arrow]

2024-07-02 Thread via GitHub
zanmato1984 opened a new pull request, #43125: URL: https://github.com/apache/arrow/pull/43125 ### Rationale for this change ### What changes are included in this PR? ### Are these changes tested? ### Are there any user-facing changes?

Re: [PR] GH-43070: [C++][Parquet] Check for valid ciphertext length to prevent segfault [arrow]

2024-07-02 Thread via GitHub
adamreeve commented on PR #43071: URL: https://github.com/apache/arrow/pull/43071#issuecomment-2205054812 @pitrou, I think I've addressed all of your comments now thank you * I've added extra validation of the buffer sizes in various places * I've switched from raw pointers to `arrow::u

Re: [PR] GH-43124: [C++] Initialize offset vector head as 0 after memory allocated in grouper.cc [arrow]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #43123: URL: https://github.com/apache/arrow/pull/43123#issuecomment-2205021941 :warning: GitHub issue #43124 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Initialize offset vector head as 0 after memory allocated in grouper.cc [arrow]

2024-07-02 Thread via GitHub
flashzxi commented on PR #43123: URL: https://github.com/apache/arrow/pull/43123#issuecomment-2205014904 @mapleFU test code ```cpp #include #include #include #include #include #include #include #include #include #include #include #include

Re: [PR] Implement dictionary support for reading ByteView from parquet [arrow-rs]

2024-07-02 Thread via GitHub
XiangpengHao commented on code in PR #5973: URL: https://github.com/apache/arrow-rs/pull/5973#discussion_r1663435355 ## parquet/src/arrow/array_reader/byte_view_array.rs: ## @@ -348,6 +365,82 @@ impl ByteViewArrayDecoderPlain { } } +pub struct ByteViewArrayDecoderDiction

Re: [PR] GH-40868: [C++][Parquet] Minor: Remove "Experimental" for parquet::RecordReader [arrow]

2024-07-02 Thread via GitHub
mapleFU commented on PR #40887: URL: https://github.com/apache/arrow/pull/40887#issuecomment-2204995621 Then we may need to change this interface: ``` // EXPERIMENTAL: Construct a RecordReader for the indicated column of the row group. // Ownership is shared with the RowGrou

Re: [PR] Implement dictionary support for reading ByteView from parquet [arrow-rs]

2024-07-02 Thread via GitHub
XiangpengHao commented on PR #5973: URL: https://github.com/apache/arrow-rs/pull/5973#issuecomment-2204992557 > Or will be be tested as part of a larger end to end parquet reading exercise later? The code to enable and test dictionary encoding is here: https://github.com/apache/a

Re: [PR] GH-40868: [C++][Parquet] Minor: Remove "Experimental" for parquet::RecordReader [arrow]

2024-07-02 Thread via GitHub
wgtmac commented on PR #40887: URL: https://github.com/apache/arrow/pull/40887#issuecomment-2204971115 IMHO, it is literally weird to remove `experimental` before removing `internal` namespace. The expectation is that the API is stable after removing `experimental` except that we will inclu

Re: [PR] Initialize offset vector head as 0 after memory allocated in grouper.cc [arrow]

2024-07-02 Thread via GitHub
mapleFU commented on PR #43123: URL: https://github.com/apache/arrow/pull/43123#issuecomment-2204971263 Thanks for the contribution! It would be better if having a test to reproduce the problem? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] [C++] Allocate memory of a random size because the passed value is an uninitialized int32_t. [arrow]

2024-07-02 Thread via GitHub
flashzxi commented on issue #43124: URL: https://github.com/apache/arrow/issues/43124#issuecomment-2204971112 lines 780-782 were added by me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Initialize offset vector head as 0 after memory allocated in grouper.cc [arrow]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #43123: URL: https://github.com/apache/arrow/pull/43123#issuecomment-2204967201 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] Initialize offset vector head as 0 after memory allocated in grouper.cc [arrow]

2024-07-02 Thread via GitHub
flashzxi opened a new pull request, #43123: URL: https://github.com/apache/arrow/pull/43123 ## bug description In grouper.cc:780, memory is allocated for offset vector of varlen column, however the vector is initialized in `encoder_.DecodeFixedLengthBuffers`, which will never be called

Re: [PR] GH-40868: [C++][Parquet] Minor: Remove "Experimental" for parquet::RecordReader [arrow]

2024-07-02 Thread via GitHub
mapleFU commented on PR #40887: URL: https://github.com/apache/arrow/pull/40887#issuecomment-2204938264 @wgtmac I tried to removed the `internal`, but `RowGroupReader::RecordReader` is a function, which conflicts here. This patch I'll first remove experimental here. -- This is an automat

Re: [PR] GH-43118: [JS] Add interval for unit MONTH_DAY_NANO [arrow]

2024-07-02 Thread via GitHub
handstuyennn commented on PR #43117: URL: https://github.com/apache/arrow/pull/43117#issuecomment-2204910523 > Could you fix lint failures? > > https://github.com/apache/arrow/actions/runs/9760275633/job/26955306166?pr=43117#step:5:150 > > ``` > $ eslint src test > >

Re: [PR] GH-43118: [JS] Add interval for unit MONTH_DAY_NANO [arrow]

2024-07-02 Thread via GitHub
handstuyennn commented on code in PR #43117: URL: https://github.com/apache/arrow/pull/43117#discussion_r1663385849 ## js/test/unit/vector/interval-month-day-nano-tests.ts: ## Review Comment: I added the license -- This is an automated message from the Apache Git Servic

Re: [PR] ci(c): Enable Meson tests in CI [arrow-adbc]

2024-07-02 Thread via GitHub
lidavidm merged PR #1949: URL: https://github.com/apache/arrow-adbc/pull/1949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] ci(c): Enable Meson tests in CI [arrow-adbc]

2024-07-02 Thread via GitHub
lidavidm commented on code in PR #1949: URL: https://github.com/apache/arrow-adbc/pull/1949#discussion_r1663334820 ## .github/workflows/native-unix.yml: ## @@ -221,19 +221,51 @@ jobs: run: | sudo apt update sudo apt install -y libpq-dev ninja-build

Re: [PR] feat(c/driver/postgresql): Read/write support for TIME64[us] [arrow-adbc]

2024-07-02 Thread via GitHub
lidavidm merged PR #1960: URL: https://github.com/apache/arrow-adbc/pull/1960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

[PR] CompatHelper: bump compat for TranscodingStreams to 0.11, (keep existing compat) [arrow-julia]

2024-07-02 Thread via GitHub
github-actions[bot] opened a new pull request, #509: URL: https://github.com/apache/arrow-julia/pull/509 This pull request changes the compat entry for the `TranscodingStreams` package from `0.9.12, 0.10` to `0.9.12, 0.10, 0.11`. This keeps the compat entries for earlier versions.

Re: [PR] feat(c/driver/postgresql): UInt(8/16/32) Writer [arrow-adbc]

2024-07-02 Thread via GitHub
lidavidm merged PR #1961: URL: https://github.com/apache/arrow-adbc/pull/1961 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] Add Arrow Flight SQL ODBC driver [arrow]

2024-07-02 Thread via GitHub
lidavidm commented on PR #40939: URL: https://github.com/apache/arrow/pull/40939#issuecomment-2204727819 Following up here - do you want any help? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [CI][Packaging] Could not resolve host: mirrorlist.centos.org; Unknown error on Linux centos 7 and wheel almalinux 2014 jobs [arrow]

2024-07-02 Thread via GitHub
kou commented on issue #43119: URL: https://github.com/apache/arrow/issues/43119#issuecomment-2204683528 I've split the centos-7-amd64 job to #43122. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] GH-43118: [JS] Add interval for unit MONTH_DAY_NANO [arrow]

2024-07-02 Thread via GitHub
kou commented on PR #43117: URL: https://github.com/apache/arrow/pull/43117#issuecomment-2204676946 Could you fix lint failures? https://github.com/apache/arrow/actions/runs/9760275633/job/26955306166?pr=43117#step:5:150 ```text $ eslint src test /build/js/src/type.ts

Re: [PR] GH-43118: [JS] Add interval for unit MONTH_DAY_NANO [arrow]

2024-07-02 Thread via GitHub
kou commented on code in PR #43117: URL: https://github.com/apache/arrow/pull/43117#discussion_r1663277916 ## js/test/unit/vector/interval-month-day-nano-tests.ts: ## Review Comment: Could you add our license header? -- This is an automated message from the Apache Git S

Re: [PR] MINOR: [ArrowFlight] Fix UCS thread mode [arrow]

2024-07-02 Thread via GitHub
kou commented on PR #43120: URL: https://github.com/apache/arrow/pull/43120#issuecomment-2204667463 This is not a MINOR change. See https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes for our MINOR definition. If we need this change, could you open a new issue for this?

Re: [PR] MINOR: [ArrowFlight] Fix UCS thread mode [arrow]

2024-07-02 Thread via GitHub
lidavidm commented on PR #43120: URL: https://github.com/apache/arrow/pull/43120#issuecomment-2204633063 Thanks! However, this code is scheduled for deprecation/removal soon in favor of the Disassociated IPC proposal -- This is an automated message from the Apache Git Service. To respond

Re: [PR] feat(go/adbc/driver): add support for Google BigQuery [arrow-adbc]

2024-07-02 Thread via GitHub
cocoa-xu commented on PR #1722: URL: https://github.com/apache/arrow-adbc/pull/1722#issuecomment-2204614424 Many thanks for the code review @joellubi! Sorry I didn't notice quite some of these test flags, and they should be corrected now. If there is no other apparent issues, I guess perhap

Re: [PR] Add size statistics to `ParquetMetaData` introduced in PARQUET-2261 [arrow-rs]

2024-07-02 Thread via GitHub
etseidl commented on PR #5486: URL: https://github.com/apache/arrow-rs/pull/5486#issuecomment-2204567034 @alamb I have the offset index done now well enough to test the metadata (might still need doc updates). I'm not entirely happy with it since the offset index is parsed twice now. I'm fi

Re: [PR] feat(go/adbc/driver): add support for Google BigQuery [arrow-adbc]

2024-07-02 Thread via GitHub
joellubi commented on code in PR #1722: URL: https://github.com/apache/arrow-adbc/pull/1722#discussion_r1663202041 ## go/adbc/driver/bigquery/driver_test.go: ## @@ -0,0 +1,1354 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] feat(c/driver/postgresql): Basic Support for Writing LIST types [arrow-adbc]

2024-07-02 Thread via GitHub
WillAyd commented on code in PR #1962: URL: https://github.com/apache/arrow-adbc/pull/1962#discussion_r1663201965 ## c/driver/postgresql/copy/writer.h: ## @@ -601,10 +667,46 @@ static inline ArrowErrorCode MakeCopyFieldWriter( case NANOARROW_TYPE_LARGE_BINARY:

Re: [I] [Python] ERROR: Failed building wheel for pyarrow [arrow]

2024-07-02 Thread via GitHub
edgarrmondragon commented on issue #34757: URL: https://github.com/apache/arrow/issues/34757#issuecomment-2204504120 There's a similar problem now with Python 3.13. In my opinion building from the `sdist` should be an option even if the new Python is not officially supported. -- This is

[PR] feat(c/driver/postgresql): Basic Support for Writing LIST types [arrow-adbc]

2024-07-02 Thread via GitHub
WillAyd opened a new pull request, #1962: URL: https://github.com/apache/arrow-adbc/pull/1962 This works towards https://github.com/apache/arrow-adbc/issues/1882 It hasn't fully been implemented to generically support all LIST types and all children they may contain, but figured its f

Re: [I] [FlightRPC][Java] Provide standard way to get client IP address [arrow]

2024-07-02 Thread via GitHub
aiguofer commented on issue #31924: URL: https://github.com/apache/arrow/issues/31924#issuecomment-2204419874 This is great! I would love to have something like this. Another nice thing to add to the `RequestInfo` would be the authority.. we've been doing this https://github.com/apache/arro

Re: [PR] MINOR: [JS] Bump tslib from 2.6.2 to 2.6.3 in /js [arrow]

2024-07-02 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #43105: URL: https://github.com/apache/arrow/pull/43105#issuecomment-2204391099 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit fd6bf5a532245ae80908d52b90d5462f9b2f9882. There were no

Re: [PR] GH-42240: [R] Fix crash in ParquetFileWriter$WriteTable and add WriteBatch [arrow]

2024-07-02 Thread via GitHub
amoeba commented on PR #42241: URL: https://github.com/apache/arrow/pull/42241#issuecomment-2204341833 I pushed up a change addressing that comment and noticed I missed documenting the new class method so I added another commit for that. Letting CI run and then I'll merge. -- This is an

Re: [PR] GH-42240: [R] Fix crash in ParquetFileWriter$WriteTable and add WriteBatch [arrow]

2024-07-02 Thread via GitHub
amoeba commented on code in PR #42241: URL: https://github.com/apache/arrow/pull/42241#discussion_r1663129893 ## r/R/parquet.R: ## @@ -428,6 +428,12 @@ ParquetFileWriter <- R6Class("ParquetFileWriter", inherit = ArrowObject, public = list( WriteTable = function(table,

Re: [PR] Update snafu [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on PR #5930: URL: https://github.com/apache/arrow-rs/pull/5930#issuecomment-2204096673 Marked as "next major release" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Release arrow-rs / parquet minor version `52.1.0` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5905: URL: https://github.com/apache/arrow-rs/issues/5905#issuecomment-2204095463 Started vote thread: https://lists.apache.org/thread/3q859mgx3p2ltozqxojk8mxq75vq673o -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] MINOR: [JS] Bump gulp-esbuild from 0.12.0 to 0.12.1 in /js [arrow]

2024-07-02 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #43106: URL: https://github.com/apache/arrow/pull/43106#issuecomment-2204072968 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 56cf6883db9980f7423891fa462e52f7acbbc70f. There were no

Re: [PR] Prepare arrow `52.1.0` [arrow-rs]

2024-07-02 Thread via GitHub
alamb merged PR #5992: URL: https://github.com/apache/arrow-rs/pull/5992 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Prepare arrow `52.1.0` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on PR #5992: URL: https://github.com/apache/arrow-rs/pull/5992#issuecomment-2204042502 https://github.com/apache/datafusion/pull/11202 is still looking good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Write Bloom filters between row groups instead of the end [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on PR #5860: URL: https://github.com/apache/arrow-rs/pull/5860#issuecomment-2204027767 This was reverted and thus does will not be present in 52.1.0 release #5905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Write Bloom filters between row groups instead of the end [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on PR #5860: URL: https://github.com/apache/arrow-rs/pull/5860#issuecomment-2204028200 I will merge https://github.com/apache/arrow-rs/pull/5933 when we open for breaking API changes -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Wrong error type in case of invalid amount in Interval components [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5986: URL: https://github.com/apache/arrow-rs/issues/5986#issuecomment-2204016550 `label_issue.py` automatically added labels {'arrow'} from #5987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Relax `WriteMultipart` API to support aborting after completion [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5977: URL: https://github.com/apache/arrow-rs/issues/5977#issuecomment-2204016458 `label_issue.py` automatically added labels {'object-store'} from #5974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] Error message in ArrowNativeTypeOp::neg_checked doesn't include the operation [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5944: URL: https://github.com/apache/arrow-rs/issues/5944#issuecomment-2204016165 `label_issue.py` automatically added labels {'arrow'} from #5980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Implement `compare_op` for `GenericBinaryView` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5897: URL: https://github.com/apache/arrow-rs/issues/5897#issuecomment-2204015578 `label_issue.py` automatically added labels {'arrow'} from #5900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Implement benchmarks for `compare_op` for `GenericBinaryView` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5903: URL: https://github.com/apache/arrow-rs/issues/5903#issuecomment-2204015730 `label_issue.py` automatically added labels {'arrow'} from #5924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] [DISCUSS] Release arrow-rs / parquet patch release `52.0.1` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5906: URL: https://github.com/apache/arrow-rs/issues/5906#issuecomment-2204015817 `label_issue.py` automatically added labels {'arrow'} from #5895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] New null with view types are not supported [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5893: URL: https://github.com/apache/arrow-rs/issues/5893#issuecomment-2204015335 `label_issue.py` automatically added labels {'arrow'} from #5894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] A new feature as a workaround hack to unavailable offset support in Arrow Java [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5959: URL: https://github.com/apache/arrow-rs/issues/5959#issuecomment-2204016322 `label_issue.py` automatically added labels {'arrow'} from #5964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Add `min_bytes` and `max_bytes` to `PageIndex` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5949: URL: https://github.com/apache/arrow-rs/issues/5949#issuecomment-2204016246 `label_issue.py` automatically added labels {'parquet'} from #5950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Cleanup ByteView construction [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5878: URL: https://github.com/apache/arrow-rs/issues/5878#issuecomment-2204015221 `label_issue.py` automatically added labels {'arrow'} from #5879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Implement arrow-row en/decoding for GenericByteView types [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5921: URL: https://github.com/apache/arrow-rs/issues/5921#issuecomment-2204016033 `label_issue.py` automatically added labels {'arrow'} from #5922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Make ObjectStoreScheme in the object_store crate public [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5911: URL: https://github.com/apache/arrow-rs/issues/5911#issuecomment-2204015923 `label_issue.py` automatically added labels {'object-store'} from #5912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] FixedSizeList got out of range when the total length of the underlying values over i32::MAX [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5901: URL: https://github.com/apache/arrow-rs/issues/5901#issuecomment-2204015658 `label_issue.py` automatically added labels {'arrow'} from #5902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Out of range when extending on a slice of string array imported through FFI [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5896: URL: https://github.com/apache/arrow-rs/issues/5896#issuecomment-2204015455 `label_issue.py` automatically added labels {'arrow'} from #5895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Cleanup ByteView construction [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5878: URL: https://github.com/apache/arrow-rs/issues/5878#issuecomment-2204015117 `label_issue.py` automatically added labels {'parquet'} from #5879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] cargo msrv test is failing on main for `object_store` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5864: URL: https://github.com/apache/arrow-rs/issues/5864#issuecomment-2204015021 `label_issue.py` automatically added labels {'parquet'} from #5863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] cargo msrv test is failing on main for `object_store` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5864: URL: https://github.com/apache/arrow-rs/issues/5864#issuecomment-2204014950 `label_issue.py` automatically added labels {'documentation'} from #5863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] `cast` kernel support for `StringViewArray` and `BinaryViewArray` `<--> `DictionaryArray` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5861: URL: https://github.com/apache/arrow-rs/issues/5861#issuecomment-2204014859 `label_issue.py` automatically added labels {'arrow'} from #5872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] parquet::ArrowWriter show allow writing Bloom filters before the end of the file [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5859: URL: https://github.com/apache/arrow-rs/issues/5859#issuecomment-2204014768 `label_issue.py` automatically added labels {'parquet'} from #5860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Make `RowSelection::from_consecutive_ranges` public [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5846: URL: https://github.com/apache/arrow-rs/issues/5846#issuecomment-2204014603 `label_issue.py` automatically added labels {'parquet'} from #5848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] `Schema::try_merge` should be able to merge List of any data type with List of Null data type [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5843: URL: https://github.com/apache/arrow-rs/issues/5843#issuecomment-2204014526 `label_issue.py` automatically added labels {'arrow'} from #5852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Add a way to move `fields` out of parquet `Row` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5841: URL: https://github.com/apache/arrow-rs/issues/5841#issuecomment-2204014452 `label_issue.py` automatically added labels {'parquet'} from #5842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Make `TimeUnit` and `IntervalUnit` `Copy` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5839: URL: https://github.com/apache/arrow-rs/issues/5839#issuecomment-2204014374 `label_issue.py` automatically added labels {'arrow'} from #5840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Add BufUploader to implement same feature upon `WriteMultipart` like `BufWriter` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5834: URL: https://github.com/apache/arrow-rs/issues/5834#issuecomment-2204014277 `label_issue.py` automatically added labels {'object-store'} from #5835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] Limit Parquet Page Row Count By Default to reduce writer memory requirements with highly compressable columns [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5797: URL: https://github.com/apache/arrow-rs/issues/5797#issuecomment-2204014172 `label_issue.py` automatically added labels {'parquet'} from #5957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Report / blog on parquet metadata sizes for "large" (1000+) numbers of columns [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5770: URL: https://github.com/apache/arrow-rs/issues/5770#issuecomment-2204014066 `label_issue.py` automatically added labels {'arrow'} from #5798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Report / blog on parquet metadata sizes for "large" (1000+) numbers of columns [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5770: URL: https://github.com/apache/arrow-rs/issues/5770#issuecomment-2204013988 `label_issue.py` automatically added labels {'parquet'} from #5863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Report / blog on parquet metadata sizes for "large" (1000+) numbers of columns [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5770: URL: https://github.com/apache/arrow-rs/issues/5770#issuecomment-2204013925 `label_issue.py` automatically added labels {'documentation'} from #5863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Structured ByteView Access (underlying StringView/BinaryView representation) [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5736: URL: https://github.com/apache/arrow-rs/issues/5736#issuecomment-2204013716 `label_issue.py` automatically added labels {'arrow'} from #5619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] [parquet_derive] support OPTIONAL (def_level = 1) columns by default [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5716: URL: https://github.com/apache/arrow-rs/issues/5716#issuecomment-2204013613 `label_issue.py` automatically added labels {'parquet-derive'} from #5717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Maps cast to other Maps with different Elements, Key and Value Names [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5702: URL: https://github.com/apache/arrow-rs/issues/5702#issuecomment-2204013542 `label_issue.py` automatically added labels {'arrow'} from #5703 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Provide Arrow Schema Hint to Parquet Reader [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5657: URL: https://github.com/apache/arrow-rs/issues/5657#issuecomment-2204013443 `label_issue.py` automatically added labels {'parquet'} from #5939 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Provide Arrow Schema Hint to Parquet Reader [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5657: URL: https://github.com/apache/arrow-rs/issues/5657#issuecomment-2204013362 `label_issue.py` automatically added labels {'arrow'} from #5605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Release arrow-rs / parquet minor version `52.1.0` [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on issue #5905: URL: https://github.com/apache/arrow-rs/issues/5905#issuecomment-2203985194 Features merged. Working on making a release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat(5851): ArrowWriter memory usage [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on PR #5967: URL: https://github.com/apache/arrow-rs/pull/5967#issuecomment-2203981853 Thanks again @wiedld -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add ParquetMetadata::memory_size size estimation [arrow-rs]

2024-07-02 Thread via GitHub
alamb merged PR #5965: URL: https://github.com/apache/arrow-rs/pull/5965 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] fix: error in case of invalid amount interval component [arrow-rs]

2024-07-02 Thread via GitHub
alamb commented on PR #5987: URL: https://github.com/apache/arrow-rs/pull/5987#issuecomment-2203981110 Thanks again @DDtKey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat(5851): ArrowWriter memory usage [arrow-rs]

2024-07-02 Thread via GitHub
alamb merged PR #5967: URL: https://github.com/apache/arrow-rs/pull/5967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [I] API to get memory usage for parquet ArrowWriter [arrow-rs]

2024-07-02 Thread via GitHub
alamb closed issue #5851: API to get memory usage for parquet ArrowWriter URL: https://github.com/apache/arrow-rs/issues/5851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add memory size estimation for `ParquetMetadata` [arrow-rs]

2024-07-02 Thread via GitHub
alamb closed issue #1729: Add memory size estimation for `ParquetMetadata` URL: https://github.com/apache/arrow-rs/issues/1729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Wrong error type in case of invalid amount in Interval components [arrow-rs]

2024-07-02 Thread via GitHub
alamb closed issue #5986: Wrong error type in case of invalid amount in Interval components URL: https://github.com/apache/arrow-rs/issues/5986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] fix: error in case of invalid amount interval component [arrow-rs]

2024-07-02 Thread via GitHub
alamb merged PR #5987: URL: https://github.com/apache/arrow-rs/pull/5987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] GH-42014: [Python] Let StructArray.from_array accept a type in addition to names or fields [arrow]

2024-07-02 Thread via GitHub
shinespiked commented on PR #43047: URL: https://github.com/apache/arrow/pull/43047#issuecomment-2203976143 @jorisvandenbossche I chose to move the argument to the end since as you said keyword-ony arguments is a larger/breaking change. I also switched the argument to type. let me know what

Re: [PR] GH-43119: [CI][Packaging] Update Manylinux 2014 Centos repos that have been deprecated [arrow]

2024-07-02 Thread via GitHub
ianmcook commented on PR #43121: URL: https://github.com/apache/arrow/pull/43121#issuecomment-2203975457 @raulcd looks like the aarch64 packages are in a different place: https://vault.centos.org/altarch/ -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] feat(go/adbc/driver): add support for Google BigQuery [arrow-adbc]

2024-07-02 Thread via GitHub
cocoa-xu commented on PR #1722: URL: https://github.com/apache/arrow-adbc/pull/1722#issuecomment-2203971921 I've updated to use `TableMetadata` for `getTableSchemaWithFilter`! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] feat(go/adbc/driver): add support for Google BigQuery [arrow-adbc]

2024-07-02 Thread via GitHub
cocoa-xu commented on code in PR #1722: URL: https://github.com/apache/arrow-adbc/pull/1722#discussion_r1662948519 ## go/adbc/driver/bigquery/connection.go: ## @@ -0,0 +1,730 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [I] [Python][Parquet] Parquet deserialization speeds slower on Linux [arrow]

2024-07-02 Thread via GitHub
alippai commented on issue #38389: URL: https://github.com/apache/arrow/issues/38389#issuecomment-2203362676 @pitrou the different compression benchmarks were running using hot file cache, right? I'd be surprised if uncompressed would be faster than lz4 without the linux file cache (even if

Re: [PR] GH-34785: [C++][Parquet] Parquet Bloom Filter Writer Implementation [arrow]

2024-07-02 Thread via GitHub
mapleFU commented on PR #37400: URL: https://github.com/apache/arrow/pull/37400#issuecomment-2203342544 I'm quite busy these few days but I promise I would try my best to check this in this month -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] GH-34785: [C++][Parquet] Parquet Bloom Filter Writer Implementation [arrow]

2024-07-02 Thread via GitHub
alippai commented on PR #37400: URL: https://github.com/apache/arrow/pull/37400#issuecomment-2203324603 I believe this missed the feature freeze deadline and won’t be included in 17.0.0, likely it’ll be part of 18.0.0. @raulcd will know better. Since this is the largest PR the past mo

Re: [I] [FlightRPC][Java] Provide standard way to get client IP address [arrow]

2024-07-02 Thread via GitHub
scruz-denodo commented on issue #31924: URL: https://github.com/apache/arrow/issues/31924#issuecomment-220333 Hi, I have done some changes about this issue. Before creating a pull request I would like to discuss them here. This is the change: https://github.com/apache/arrow/co

Re: [PR] GH-41640: [Go] Implement BYTE_STREAM_SPLIT Parquet Encoding [arrow]

2024-07-02 Thread via GitHub
mapleFU commented on code in PR #43066: URL: https://github.com/apache/arrow/pull/43066#discussion_r1661082892 ## go/parquet/internal/encoding/encoding_test.go: ## @@ -406,6 +406,17 @@ func (b *BaseEncodingTestSuite) TestDeltaByteArrayRoundTrip() { } } +func (b *Base

  1   2   >