Re: [PR] GH-37242: [Python][Parquet] Parquet Support write and validate Page CRC (Take 2) [arrow]

2023-11-06 Thread via GitHub
mapleFU commented on PR #38360: URL: https://github.com/apache/arrow/pull/38360#issuecomment-1797975677 CI failed is unrelated, I'll rerun and wait other committers' review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
mapleFU commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797969438 I think we can Provide a MockInputStream(with readAsync and read counting) and hardcode an IO-count here. Any change changes the IO count can report the change here. Also cc @

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
eeroel commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797966320 > Yeah. Maybe we can add a test for counting IO-ops to prevent from regression later... This is so awkward :-( Do you have an idea how to do it? I currently just tested by setti

Re: [PR] WIP: [Release] Verify release-14.0.1-rc0 [arrow]

2023-11-06 Thread via GitHub
github-actions[bot] commented on PR #38620: URL: https://github.com/apache/arrow/pull/38620#issuecomment-1797961274 Revision: ba537483618196f50c67a90a473039e4d5dc35e0 Submitted crossbow builds: [ursacomputing/crossbow @ actions-afc997d628](https://github.com/ursacomputing/crossbow/bra

Re: [PR] WIP: [Release] Verify release-14.0.1-rc0 [arrow]

2023-11-06 Thread via GitHub
kou commented on PR #38620: URL: https://github.com/apache/arrow/pull/38620#issuecomment-1797956189 @github-actions crossbow submit --group verify-rc-binaries --group verify-rc-wheels --param release=14.0.1 --param rc=0 -- This is an automated message from the Apache Git Service. To resp

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
mapleFU commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797957661 Yeah. Maybe we can add a test for counting IO-ops to prevent from regression later... This is so awkward :-( -- This is an automated message from the Apache Git Service. To respond

Re: [PR] GH-37242: [Python][Parquet] Parquet Support write and validate Page CRC (Take 2) [arrow]

2023-11-06 Thread via GitHub
frazar commented on PR #38360: URL: https://github.com/apache/arrow/pull/38360#issuecomment-1797957032 @mapleFU there was a CI failure, but it could be spurious. Could you retrigger the job? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
eeroel commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797950087 > Very nice catch! Seems that this commit ignore your change and re-create the file. this might be done before your change is checked in 😅 And after your change and rebase, this didn'

Re: [I] Add support for Union arrays in Parquet [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey commented on issue #73: URL: https://github.com/apache/arrow-rs/issues/73#issuecomment-1797950937 I'm assuming this depends on https://github.com/apache/parquet-format/pull/44 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] Use FairSpillPool for TaskContext with spillable config [arrow-datafusion]

2023-11-06 Thread via GitHub
viirya opened a new pull request, #8072: URL: https://github.com/apache/arrow-datafusion/pull/8072 ## Which issue does this PR close? Closes #8069. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
mapleFU commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797948698 Very nice catch! Seems that this commit ignore your change and re-create the file. this might be done before your change is checked in 😅 And after your change and rebase, this didn't

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
eeroel commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797946286 This is probably where the extra requests come from, I wonder if these `source.Open()` and `parquet::ParquetFileReader::OpenAsync` calls are necessary? https://github.com/apache/arro

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
mapleFU commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797946394 > This is the commit after which there is an extra HEAD request: https://github.com/apache/arrow/commit/0793432ad0ef5cb598b7b1e61071cd4991bd1b8b Did you find it causing the

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
eeroel commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797939879 > #36765 > > The issue related patch changes Prefetch default mode from `Default` to `LazyDefault`, but it maybe doesn't increment the req-count.. This is the commit afte

Re: [I] Parquet: writing zero to statistics should follow spec [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey commented on issue #5047: URL: https://github.com/apache/arrow-rs/issues/5047#issuecomment-1797939935 I plan to take this on, maybe after https://github.com/apache/arrow-rs/pull/5003 is merged so can do f16, f32, f64 all at once -- This is an automated message from the Apache Gi

[I] Parquet: writing zero to statistics should follow spec [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey opened a new issue, #5047: URL: https://github.com/apache/arrow-rs/issues/5047 **Describe the bug** https://github.com/apache/parquet-format/blob/46cc3a0647d301bb9579ca8dd2cc356caf2a72d2/README.md?plain=1#L162-L178 ``` * FLOAT, DOUBLE - Signed comparison with sp

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
mapleFU commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797936408 https://github.com/apache/arrow/issues/36765 The issue related patch changes Prefetch default mode from `Default` to `LazyDefault`, but it maybe doesn't increment the req-count

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
eeroel commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797924186 > > Using [02de3c1](https://github.com/apache/arrow/commit/02de3c1789460304e958936b78d60f824921c250), one HEAD request and two GET requests are made for each file. Also the requests

Re: [PR] feat: emitting partial join results in `HashJoinStream` [arrow-datafusion]

2023-11-06 Thread via GitHub
metesynnada commented on code in PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#discussion_r1384452396 ## datafusion/sqllogictest/test_files/join_disable_repartition_joins.slt: ## @@ -72,11 +72,11 @@ SELECT t1.a, t1.b, t1.c, t2.a as a2 ON t1.d = t2.d ORDER

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
mapleFU commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797921057 > Using https://github.com/apache/arrow/commit/02de3c1789460304e958936b78d60f824921c250, one HEAD request and two GET requests are made for each file. Also the requests are made con

Re: [PR] feat: emitting partial join results in `HashJoinStream` [arrow-datafusion]

2023-11-06 Thread via GitHub
metesynnada commented on code in PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#discussion_r1384453471 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -747,6 +753,97 @@ where Ok(()) } +// State for storing left/right side indices used for p

Re: [I] Include NaN in Parquet stats (again) [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey commented on issue #264: URL: https://github.com/apache/arrow-rs/issues/264#issuecomment-1797919977 I think this should be good to close now? PARQUET-1222 has been merged (https://github.com/apache/parquet-format/pull/185) a while back and looks like `parquet` crate is alrea

Re: [I] [EPIC] Unify Function Interface (remove `BuiltInScalarFunction`) [arrow-datafusion]

2023-11-06 Thread via GitHub
2010YOUY01 commented on issue #8045: URL: https://github.com/apache/arrow-datafusion/issues/8045#issuecomment-1797907140 According to the previous prototyping in https://github.com/apache/arrow-datafusion/pull/7978, we might need to do several cleanups towards this issue. ### Fo

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-06 Thread via GitHub
eeroel commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1797900711 Here's a reproducible example that doesn't use FileSystemDataset but `parquet.read_table`: ```import pyarrow._s3fs pyarrow._s3fs.initialize_s3(pyarrow._s3fs.S3LogLevel.Trace)

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-06 Thread via GitHub
vibhatha commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1384438762 ## java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileWriter.java: ## @@ -130,14 +130,24 @@ protected void endInternal(WriteChannel out) throws IOExceptio

Re: [I] Proposal: Change `Accumulator` trait to accept `RecordBatch` / `num_rows` to allow faster `Count` [arrow-datafusion]

2023-11-06 Thread via GitHub
2010YOUY01 commented on issue #8067: URL: https://github.com/apache/arrow-datafusion/issues/8067#issuecomment-1797870588 Was that because this counting operation is possible to be done during scanning? Looks like it's a case of aggregate pushdown. For `min()/max()/count()` aggregate

Re: [I] [Python][FlightRPC] Segmentation Fault when invoking authenticate concurrently over a same FlightClient [arrow]

2023-11-06 Thread via GitHub
kou commented on issue #38565: URL: https://github.com/apache/arrow/issues/38565#issuecomment-1797865170 Ah, `arrow::flight::FlightClient::Authenticate()` isn't thread safe: https://github.com/apache/arrow/blob/25c18d8cd6a299f3bb6b72966f2dca357db26399/cpp/src/arrow/flight/transport/grpc/grpc

Re: [PR] GH-38460: [Java][FlightRPC] Add mTLS support for Flight SQL JDBC driver [arrow]

2023-11-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38461: URL: https://github.com/apache/arrow/pull/38461#issuecomment-1797852616 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 4ff1a29aae561cad6851a13666e4375a7645c6ef. There were no

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-06 Thread via GitHub
vibhatha commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1384395189 ## java/memory/memory-core/src/test/java/org/apache/arrow/memory/util/TestArrowBufPointer.java: ## @@ -204,6 +204,11 @@ public int hashCode(ArrowBuf buf, long offset,

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-06 Thread via GitHub
vibhatha commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1384395189 ## java/memory/memory-core/src/test/java/org/apache/arrow/memory/util/TestArrowBufPointer.java: ## @@ -204,6 +204,11 @@ public int hashCode(ArrowBuf buf, long offset,

Re: [I] [Epic] A new Scalar Function interface [arrow-datafusion]

2023-11-06 Thread via GitHub
2010YOUY01 closed issue #7977: [Epic] A new Scalar Function interface URL: https://github.com/apache/arrow-datafusion/issues/7977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-06 Thread via GitHub
vibhatha commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1384386091 ## java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowReader.java: ## @@ -243,6 +248,8 @@ protected void loadDictionary(ArrowDictionaryBatch dictionaryBatch)

Re: [PR] Parquet: read/write f16 for Arrow [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey commented on code in PR #5003: URL: https://github.com/apache/arrow-rs/pull/5003#discussion_r1384372963 ## parquet/src/column/writer/mod.rs: ## @@ -967,18 +969,23 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> { } fn update_min(descr: &ColumnDescript

Re: [PR] fix(go/driver/snowflake): handling of integer values sent for NUMBER columns [arrow-adbc]

2023-11-06 Thread via GitHub
CurtHagenlocher commented on code in PR #1267: URL: https://github.com/apache/arrow-adbc/pull/1267#discussion_r1384331835 ## go/adbc/driver/snowflake/driver_test.go: ## @@ -679,6 +681,96 @@ func (suite *SnowflakeTests) TestUseHighPrecision() { suite.Equal(9876543210.99,

Re: [PR] Add example to ci [arrow-datafusion]

2023-11-06 Thread via GitHub
smallzhongfeng commented on PR #8060: URL: https://github.com/apache/arrow-datafusion/pull/8060#issuecomment-1797588502 > Thank you @smallzhongfeng -- this is great > > Can you also please update the example README document with the new paths: https://github.com/apache/arrow-datafusi

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-06 Thread via GitHub
vibhatha commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1384294051 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Parquet: read/write f16 for Arrow [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey commented on PR #5003: URL: https://github.com/apache/arrow-rs/pull/5003#issuecomment-1797385065 Just realized I need to also fix the statistics handling as well, otherwise I think it might write incorrect stats for files -- This is an automated message from the Apache Git Servic

Re: [PR] Fix handling of integer values sent by Snowflake for NUMBER columns [arrow-adbc]

2023-11-06 Thread via GitHub
github-actions[bot] commented on PR #1267: URL: https://github.com/apache/arrow-adbc/pull/1267#issuecomment-1797368703 :warning: Please follow the [Conventional Commits format in CONTRIBUTING.md](https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md) for PR titles. -- This is an

[PR] Fix handling of integer values sent by Snowflake for NUMBER columns [arrow-adbc]

2023-11-06 Thread via GitHub
CurtHagenlocher opened a new pull request, #1267: URL: https://github.com/apache/arrow-adbc/pull/1267 Addresses #1242. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-06 Thread via GitHub
vibhatha commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1384289084 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [I] S3FileSystem delete_dir() regression in PyArrow 14 [arrow]

2023-11-06 Thread via GitHub
kou commented on issue #38618: URL: https://github.com/apache/arrow/issues/38618#issuecomment-1797339889 Could you provide a minimal script to reproduce this problem? If you can build PyArrow on local, could you try `git bisect` to detect the commit that is related to this problem?

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-06 Thread via GitHub
vibhatha commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1384285436 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BaseDictionary.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[PR] fix(glib): add missing "pkg-config --modversion arrow-glib" result check [arrow-adbc]

2023-11-06 Thread via GitHub
kou opened a new pull request, #1266: URL: https://github.com/apache/arrow-adbc/pull/1266 Fixes #1265. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[PR] fix(dev/release): install missing protobuf for Python test [arrow-adbc]

2023-11-06 Thread via GitHub
kou opened a new pull request, #1264: URL: https://github.com/apache/arrow-adbc/pull/1264 Fixes #1263. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] GH-38578: [Java][FlightSQL] Remove joda usage from flight-sql library [arrow]

2023-11-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38579: URL: https://github.com/apache/arrow/pull/38579#issuecomment-1797316066 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 02d8bd26ef5a8a6f840d4cc98f669ea81c534487. There were no

Re: [I] [MATLAB] Release version 0.1 of the MATLAB interface to Arrow [arrow]

2023-11-06 Thread via GitHub
kou commented on issue #38612: URL: https://github.com/apache/arrow/issues/38612#issuecomment-1797261154 Do you want to use different version numbers as other implementations in this repository? (We'll use "15.0.0" for the next release.) -- This is an automated message from the Apache Git

Re: [PR] MINOR: [JS] Bump eslint from 8.42.0 to 8.52.0 in /js [arrow]

2023-11-06 Thread via GitHub
kou merged PR #38545: URL: https://github.com/apache/arrow/pull/38545 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] MINOR: [C#] Bump BenchmarkDotNet.Diagnostics.Windows from 0.13.9 to 0.13.10 in /csharp [arrow]

2023-11-06 Thread via GitHub
kou merged PR #38605: URL: https://github.com/apache/arrow/pull/38605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [I] Regression when serializing large json numbers [arrow-rs]

2023-11-06 Thread via GitHub
Blajda commented on issue #5038: URL: https://github.com/apache/arrow-rs/issues/5038#issuecomment-1797148637 Thanks for the fix @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] Parquet: Fix string display for f32/f64 zero values [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey opened a new pull request, #5046: URL: https://github.com/apache/arrow-rs/pull/5046 # Which issue does this PR close? Closes #5045 # Rationale for this change # What changes are included in this PR? # Are there any user-facing chan

[I] Parquet record api: display for float/double 0.0 shows 0E0 [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey opened a new issue, #5045: URL: https://github.com/apache/arrow-rs/issues/5045 **Describe the bug** Floats and doubles that are zero (or negative zero) are displayed in format `0E0` for `Field` from record api **To Reproduce** In unit test: https://g

Re: [PR] GH-38330: [C++][Azure] Use properties for input stream metadata [arrow]

2023-11-06 Thread via GitHub
kou merged PR #38524: URL: https://github.com/apache/arrow/pull/38524 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-38330: [C++][Azure] Use properties for input stream metadata [arrow]

2023-11-06 Thread via GitHub
kou commented on PR #38524: URL: https://github.com/apache/arrow/pull/38524#issuecomment-1797129801 I'll merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] WIP: [Release] Verify release-14.0.1-rc0 [arrow]

2023-11-06 Thread via GitHub
raulcd commented on PR #38620: URL: https://github.com/apache/arrow/pull/38620#issuecomment-1797121553 Revision: ba537483618196f50c67a90a473039e4d5dc35e0 Submitted crossbow builds: [ursacomputing/crossbow @ release-14.0.1-rc0-1](https://github.com/ursacomputing/crossbow/branches/all?q

[PR] General approach for Array repeat [arrow-datafusion]

2023-11-06 Thread via GitHub
jayzhan211 opened a new pull request, #8071: URL: https://github.com/apache/arrow-datafusion/pull/8071 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [PR] WIP: [Release] Verify release-14.0.1-rc0 [arrow]

2023-11-06 Thread via GitHub
raulcd commented on PR #38620: URL: https://github.com/apache/arrow/pull/38620#issuecomment-1797116715 Revision: ba537483618196f50c67a90a473039e4d5dc35e0 Submitted crossbow builds: [ursacomputing/crossbow @ release-14.0.1-rc0-0](https://github.com/ursacomputing/crossbow/branches/all?q

Re: [PR] WIP: [Release] Verify release-14.0.1-rc0 [arrow]

2023-11-06 Thread via GitHub
github-actions[bot] commented on PR #38620: URL: https://github.com/apache/arrow/pull/38620#issuecomment-1797113569 Revision: ba537483618196f50c67a90a473039e4d5dc35e0 Submitted crossbow builds: [ursacomputing/crossbow @ actions-563cb0723b](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-38570: [R] Ensure that test-nix-libs is warning free [arrow]

2023-11-06 Thread via GitHub
assignUser merged PR #38571: URL: https://github.com/apache/arrow/pull/38571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

[PR] WIP: [Release] Verify release-14.0.1-rc0 [arrow]

2023-11-06 Thread via GitHub
raulcd opened a new pull request, #38620: URL: https://github.com/apache/arrow/pull/38620 PR to verify Release Candidate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-38570: [R] Ensure that test-nix-libs is warning free [arrow]

2023-11-06 Thread via GitHub
assignUser commented on code in PR #38571: URL: https://github.com/apache/arrow/pull/38571#discussion_r1384223549 ## r/tools/nixlibs.R: ## @@ -828,10 +832,10 @@ quietly <- !env_is("ARROW_R_DEV", "true") not_cran <- env_is("NOT_CRAN", "true") -if (is_release) { +if (is_relea

Re: [PR] WIP: [Release] Verify release-14.0.1-rc0 [arrow]

2023-11-06 Thread via GitHub
raulcd commented on PR #38620: URL: https://github.com/apache/arrow/pull/38620#issuecomment-1797109767 @github-actions crossbow submit --group verify-rc-source --param release=14.0.1 --param rc=0 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] QUESTION: How to extract Decimal Element without knowing precision / scale [arrow-nanoarrow]

2023-11-06 Thread via GitHub
WillAyd commented on issue #314: URL: https://github.com/apache/arrow-nanoarrow/issues/314#issuecomment-1797102117 Ah OK thanks. I see I can do that in nanoarrow when paired with the schema then -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] MINOR: [JS] Bump @types/command-line-args from 5.2.0 to 5.2.2 in /js [arrow]

2023-11-06 Thread via GitHub
kou merged PR #38546: URL: https://github.com/apache/arrow/pull/38546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [I] QUESTION: How to extract Decimal Element without knowing precision / scale [arrow-nanoarrow]

2023-11-06 Thread via GitHub
lidavidm commented on issue #314: URL: https://github.com/apache/arrow-nanoarrow/issues/314#issuecomment-1797094502 For Arrow decimal arrays, all elements have the same precision and scale - it's part of the array type. -- This is an automated message from the Apache Git Service. To resp

Re: [I] [Docs][Specifications] Is this intentional or a typo? [arrow]

2023-11-06 Thread via GitHub
kou commented on issue #38599: URL: https://github.com/apache/arrow/issues/38599#issuecomment-1797076653 I think that we should use level 2 for "Device Stream Interface" and sub sections in it or use separated page like https://arrow.apache.org/docs/format/CDataInterface.html and https://a

[I] QUESTION: How to extract Decimal Element without knowing precision / scale [arrow-nanoarrow]

2023-11-06 Thread via GitHub
WillAyd opened a new issue, #314: URL: https://github.com/apache/arrow-nanoarrow/issues/314 As far as I can tell this is a valid construct for creating a DecimalArray ```c struct ArrowArray array; struct ArrowDecimal decimal1; struct ArrowDecimal decimal2; ArrowDecimalI

Re: [I] [Python][FlightRPC] Segmentation Fault when invoking authenticate concurrently over a same FlightClient [arrow]

2023-11-06 Thread via GitHub
kou commented on issue #38565: URL: https://github.com/apache/arrow/issues/38565#issuecomment-1797066847 Thanks. Could you execute the following commands in the debugger session? ```text (lldb) f 10 (lldb) p vtable_ ``` -- This is an automated message from the Apache

Re: [PR] GH-38602: [R] Add missing `prod` for summarize [arrow]

2023-11-06 Thread via GitHub
paleolimbot merged PR #38601: URL: https://github.com/apache/arrow/pull/38601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] Parquet: read/write f16 for Arrow [arrow-rs]

2023-11-06 Thread via GitHub
Jefffrey commented on code in PR #5003: URL: https://github.com/apache/arrow-rs/pull/5003#discussion_r1384173903 ## parquet/src/file/statistics.rs: ## @@ -243,6 +243,8 @@ pub fn to_thrift(stats: Option<&Statistics>) -> Option { distinct_count: stats.distinct_count().ma

Re: [PR] GH-38381: [C++][Acero] Create a sorted merge node [arrow]

2023-11-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38380: URL: https://github.com/apache/arrow/pull/38380#issuecomment-1797021664 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit b55d8d5664480f63c2be75f0e18eeb006b640427. There were no

Re: [PR] GH-38255: [Go][C++] Implement Flight SQL Bulk Ingestion [arrow]

2023-11-06 Thread via GitHub
joellubi commented on code in PR #38385: URL: https://github.com/apache/arrow/pull/38385#discussion_r1384165515 ## go/arrow/flight/flightsql/client.go: ## @@ -218,6 +218,56 @@ func (c *Client) ExecuteSubstraitUpdate(ctx context.Context, plan SubstraitPlan, return update

Re: [PR] Minor: remove duplicated `array_replace` tests [arrow-datafusion]

2023-11-06 Thread via GitHub
jayzhan211 commented on PR #8066: URL: https://github.com/apache/arrow-datafusion/pull/8066#issuecomment-1797004052 > I wonder what you think of this @jayzhan211 ? If you agree that this is an improvement, I can do the same for the other tests in `array_expressions.rs`. We should mov

Re: [PR] GH-37069: [C#] Experimental tests of too large record batches [arrow]

2023-11-06 Thread via GitHub
github-actions[bot] commented on PR #38619: URL: https://github.com/apache/arrow/pull/38619#issuecomment-1796995523 :warning: GitHub issue #37069 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Experimental tests for issue #37069 [arrow]

2023-11-06 Thread via GitHub
github-actions[bot] commented on PR #38619: URL: https://github.com/apache/arrow/pull/38619#issuecomment-1796984789 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] Experimental tests for issue #37069 [arrow]

2023-11-06 Thread via GitHub
voidstar69 opened a new pull request, #38619: URL: https://github.com/apache/arrow/pull/38619 Attempt (badly) to test reading and writing large batches, potentially batches 2GB+ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] GH-38335: [C++] Implement `GetFileInfo` for a single file in Azure filesystem [arrow]

2023-11-06 Thread via GitHub
Tom-Newton commented on code in PR #38505: URL: https://github.com/apache/arrow/pull/38505#discussion_r1384123947 ## cpp/src/arrow/filesystem/azurefs_test.cc: ## @@ -216,23 +226,184 @@ class TestAzureFileSystem : public ::testing::Test { void UploadLines(const std::vector& li

Re: [PR] Cast from integer/timestamp to timestamp/integer [arrow-rs]

2023-11-06 Thread via GitHub
viirya commented on code in PR #5040: URL: https://github.com/apache/arrow-rs/pull/5040#discussion_r1384119989 ## arrow-cast/src/cast.rs: ## @@ -1621,24 +1621,104 @@ pub fn cast_with_options( .unary::<_, Time64MicrosecondType>(|x| x / (NANOSECONDS / MICROSECOND

Re: [PR] GH-38607: [Python] Disable PyExtensionType autoload [arrow]

2023-11-06 Thread via GitHub
raulcd merged PR #38608: URL: https://github.com/apache/arrow/pull/38608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] GH-38562: [Packaging] Add support for Ubuntu 23.10 [arrow]

2023-11-06 Thread via GitHub
kou commented on PR #38563: URL: https://github.com/apache/arrow/pull/38563#issuecomment-1796861636 You can download needed `*.deb` from https://github.com/ursacomputing/crossbow/releases/tag/actions-1df9c0570d-github-ubuntu-mantic-amd64 and install them by `apt install ./*.deb` (`./` is im

Re: [I] Wrong timestamp type read while from parquet file created by spark [arrow-datafusion]

2023-11-06 Thread via GitHub
viirya commented on issue #7958: URL: https://github.com/apache/arrow-datafusion/issues/7958#issuecomment-1796857735 Spark community tried (https://issues.apache.org/jira/browse/SPARK-27528) to change default Parquet timestamp type to TIMESTAMP_MICROS but was reverted back to INT96 for eco

Re: [PR] GH-38562: [Packaging] Add support for Ubuntu 23.10 [arrow]

2023-11-06 Thread via GitHub
raulcd commented on PR #38563: URL: https://github.com/apache/arrow/pull/38563#issuecomment-1796857025 Hi @almejo In order to reach the official repositories we require an Apache Arrow release. This is targeted for our next release, 15.0.0. We have a 3 month release cadence and the 15.0.0 i

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-06 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1384079401 ## cpp/src/arrow/array/concatenate.cc: ## @@ -113,33 +122,30 @@ Status ConcatenateOffsets(const BufferVector& buffers, MemoryPool* pool, values_ranges->resize(buf

Re: [PR] MINOR: [C#] Bump BenchmarkDotNet from 0.13.9 to 0.13.10 in /csharp [arrow]

2023-11-06 Thread via GitHub
kou merged PR #38606: URL: https://github.com/apache/arrow/pull/38606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] MINOR: [C#] Bump xunit from 2.5.3 to 2.6.1 in /csharp [arrow]

2023-11-06 Thread via GitHub
kou merged PR #38604: URL: https://github.com/apache/arrow/pull/38604 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] MINOR: [C#] Bump Google.Protobuf from 3.24.4 to 3.25.0 in /csharp [arrow]

2023-11-06 Thread via GitHub
kou merged PR #38603: URL: https://github.com/apache/arrow/pull/38603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-06 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1384084105 ## cpp/src/arrow/array/concatenate.cc: ## @@ -160,16 +166,69 @@ Status PutOffsets(const std::shared_ptr& src, Offset first_offset, Offse // Write offsets into d

Re: [PR] MINOR: [JS] Bump rollup and @rollup/stream in /js [arrow]

2023-11-06 Thread via GitHub
kou merged PR #38595: URL: https://github.com/apache/arrow/pull/38595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-06 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1384079401 ## cpp/src/arrow/array/concatenate.cc: ## @@ -113,33 +122,30 @@ Status ConcatenateOffsets(const BufferVector& buffers, MemoryPool* pool, values_ranges->resize(buf

Re: [I] [Java] Writer helper methods need to be applied to PromotableWriters [arrow]

2023-11-06 Thread via GitHub
jduo commented on issue #38614: URL: https://github.com/apache/arrow/issues/38614#issuecomment-1796812974 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[PR] Minor: Improve HashJoinStream docstrings [arrow-datafusion]

2023-11-06 Thread via GitHub
alamb opened a new pull request, #8070: URL: https://github.com/apache/arrow-datafusion/pull/8070 ## Which issue does this PR close? Related to https://github.com/apache/arrow-datafusion/pull/8020 ## Rationale for this change While reviewing https://github.com/apache/arrow-da

Re: [PR] GH-38562: [Packaging] Add support for Ubuntu 23.10 [arrow]

2023-11-06 Thread via GitHub
almejo commented on PR #38563: URL: https://github.com/apache/arrow/pull/38563#issuecomment-1796505854 How much time is until it reach the repositories? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-38335: [C++] Implement `GetFileInfo` for a single file in Azure filesystem [arrow]

2023-11-06 Thread via GitHub
Tom-Newton commented on code in PR #38505: URL: https://github.com/apache/arrow/pull/38505#discussion_r1384001217 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -317,27 +321,136 @@ class ObjectInputFile final : public io::RandomAccessFile { class AzureFileSystem::Impl { publi

Re: [PR] GH-36912: [Java] JDBC driver stops consuming roots if it sees an empty root [arrow]

2023-11-06 Thread via GitHub
lidavidm merged PR #38590: URL: https://github.com/apache/arrow/pull/38590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

Re: [I] TPCH, Query 18 and 17 very slow [arrow-datafusion]

2023-11-06 Thread via GitHub
alamb commented on issue #5646: URL: https://github.com/apache/arrow-datafusion/issues/5646#issuecomment-1796493899 > I wonder, if it would be better / more correct to rely on worth-case scenario for such filters, and simply propagate input statistics Perhaps the filter can simply s

Re: [I] panicked at 'index out of bounds: the len is 0 but the index is 0' in `datafusion::physical_plan::projection::validate_output_ordering` [arrow-datafusion]

2023-11-06 Thread via GitHub
alamb commented on issue #7482: URL: https://github.com/apache/arrow-datafusion/issues/7482#issuecomment-1796469349 Closing this issue as it was fixed in 33.0.0. Thanks @sergiimk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] panicked at 'index out of bounds: the len is 0 but the index is 0' in `datafusion::physical_plan::projection::validate_output_ordering` [arrow-datafusion]

2023-11-06 Thread via GitHub
alamb closed issue #7482: panicked at 'index out of bounds: the len is 0 but the index is 0' in `datafusion::physical_plan::projection::validate_output_ordering` URL: https://github.com/apache/arrow-datafusion/issues/7482 -- This is an automated message from the Apache Git Service. To respo

Re: [PR] feat(go/adbc/driver/snowflake): add support for ExecuteSchema [arrow-adbc]

2023-11-06 Thread via GitHub
lidavidm merged PR #1262: URL: https://github.com/apache/arrow-adbc/pull/1262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] GH-38576: [Java] Change JDBC driver to optionally preserve cookies and auth tokens when getting streams [arrow]

2023-11-06 Thread via GitHub
jduo commented on code in PR #38580: URL: https://github.com/apache/arrow/pull/38580#discussion_r1383974001 ## java/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/client/ArrowFlightSqlClientHandler.java: ## @@ -672,10 +733,17 @@ public ArrowFlightSqlClien

Re: [I] Allow projection of schemas/structs [arrow]

2023-11-06 Thread via GitHub
Fokko commented on issue #38615: URL: https://github.com/apache/arrow/issues/38615#issuecomment-1796400301 Thanks again for the quick response @lidavidm The current operation that does this is called `.select()` which accepts names and indices. However, this would require traversing

Re: [PR] feat(go/adbc/driver/snowflake): add support for ExecuteSchema [arrow-adbc]

2023-11-06 Thread via GitHub
CurtHagenlocher commented on code in PR #1262: URL: https://github.com/apache/arrow-adbc/pull/1262#discussion_r1383953840 ## go/adbc/driver/snowflake/statement.go: ## @@ -584,6 +584,41 @@ func (st *statement) ExecuteUpdate(ctx context.Context) (int64, error) { return n,

  1   2   3   >