Re: [I] [Python] Rewrite pyarrow.jvm using the C data interface [arrow]

2023-12-13 Thread via GitHub
jorisvandenbossche commented on issue #29891: URL: https://github.com/apache/arrow/issues/29891#issuecomment-1855344039 Certainly! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] refactor: `HashJoinStream` state machine [arrow-datafusion]

2023-12-13 Thread via GitHub
korowa commented on PR #8538: URL: https://github.com/apache/arrow-datafusion/pull/8538#issuecomment-1855336622 Benchmark results for tpch_mem are ``` Benchmark tpch_mem.json ┏━━┳━━┳━┳━━━

[PR] refactor: `HashJoinStream` state machine [arrow-datafusion]

2023-12-13 Thread via GitHub
korowa opened a new pull request, #8538: URL: https://github.com/apache/arrow-datafusion/pull/8538 ## Which issue does this PR close? Part of #8130. ## Rationale for this change Structuring HashJoinStream processing logic based on @alamb [design suggestio

[PR] Remove order_bys from AggregateExec state [arrow-datafusion]

2023-12-13 Thread via GitHub
mustafasrepo opened a new pull request, #8537: URL: https://github.com/apache/arrow-datafusion/pull/8537 ## Which issue does this PR close? Closes #. ## Rationale for this change While working on another PR, I realized that `Arc` has `order_by` method, and keeps its

[PR] MINOR: [C++] Use Cast() instead of CastTo() for Timestamp Scalar in test [arrow]

2023-12-13 Thread via GitHub
llama90 opened a new pull request, #39226: URL: https://github.com/apache/arrow/pull/39226 ### Rationale for this change Remove legacy code This is a sub-PR of the PR mentioned below. * #39060 ### What changes are included in this PR? * Rep

Re: [I] [R] read_parquet from s3 is slow and often flakey [arrow]

2023-12-13 Thread via GitHub
tcash21 commented on issue #36007: URL: https://github.com/apache/arrow/issues/36007#issuecomment-1855272885 Adding `?region` did not help as I'm still experiencing timeouts. I'm trying to read a large directory of parquet files but only bring back the one file that matches a particular ID

Re: [PR] Add test for DataFrame::write_table [arrow-datafusion]

2023-12-13 Thread via GitHub
metesynnada commented on PR #8531: URL: https://github.com/apache/arrow-datafusion/pull/8531#issuecomment-1855245278 Can we add a simpler version into Datafusion examples? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] GH-39182: [C++] Add Deprecate Macro to CastTo function [arrow]

2023-12-13 Thread via GitHub
kou commented on code in PR #39192: URL: https://github.com/apache/arrow/pull/39192#discussion_r1426243159 ## c_glib/arrow-glib/scalar.cpp: ## @@ -385,7 +385,9 @@ garrow_scalar_cast(GArrowScalar *scalar, { const auto arrow_scalar = garrow_scalar_get_raw(scalar); const aut

Re: [PR] GH-39182: [C++] Add Deprecate Macro to CastTo function [arrow]

2023-12-13 Thread via GitHub
llama90 commented on PR #39192: URL: https://github.com/apache/arrow/pull/39192#issuecomment-1855224698 @bkietz I got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-39182: [C++] Add Deprecate Macro to CastTo function [arrow]

2023-12-13 Thread via GitHub
llama90 commented on code in PR #39192: URL: https://github.com/apache/arrow/pull/39192#discussion_r1426230318 ## c_glib/arrow-glib/scalar.cpp: ## @@ -385,7 +385,9 @@ garrow_scalar_cast(GArrowScalar *scalar, { const auto arrow_scalar = garrow_scalar_get_raw(scalar); const

Re: [PR] GH-39182: [C++] Add Deprecate Macro to CastTo function [arrow]

2023-12-13 Thread via GitHub
llama90 commented on code in PR #39192: URL: https://github.com/apache/arrow/pull/39192#discussion_r1426230318 ## c_glib/arrow-glib/scalar.cpp: ## @@ -385,7 +385,9 @@ garrow_scalar_cast(GArrowScalar *scalar, { const auto arrow_scalar = garrow_scalar_get_raw(scalar); const

Re: [PR] GH-38157: [C++][Parquet] Decode: PlainBooleanDecoder batch decode [arrow]

2023-12-13 Thread via GitHub
mapleFU commented on PR #38158: URL: https://github.com/apache/arrow/pull/38158#issuecomment-1855212935 I've gothrough the code, the main reason that it's slow is that we don't have a fast enough `unpack8`... -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [MINOR]: Make some slt tests deterministic [arrow-datafusion]

2023-12-13 Thread via GitHub
mustafasrepo merged PR #8525: URL: https://github.com/apache/arrow-datafusion/pull/8525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] [MINOR]: Make some slt tests deterministic [arrow-datafusion]

2023-12-13 Thread via GitHub
mustafasrepo commented on code in PR #8525: URL: https://github.com/apache/arrow-datafusion/pull/8525#discussion_r1426215967 ## datafusion/sqllogictest/test_files/distinct_on.slt: ## @@ -38,17 +38,17 @@ LOCATION '../../testing/data/csv/aggregate_test_100.csv' # Basic example: d

Re: [PR] [MINOR]: Make some slt tests deterministic [arrow-datafusion]

2023-12-13 Thread via GitHub
mustafasrepo commented on PR #8525: URL: https://github.com/apache/arrow-datafusion/pull/8525#issuecomment-1855200510 > Quite curious, if the test is not deterministic, why we did not fail on this test before? > > Oh, I think probably because of `set datafusion.execution.target_part

Re: [PR] GH-37126: [C++][Parquet] Encoding: Unify the style of handling num_values in PlainBooleanDecoder [arrow]

2023-12-13 Thread via GitHub
mapleFU closed pull request #37127: GH-37126: [C++][Parquet] Encoding: Unify the style of handling num_values in PlainBooleanDecoder URL: https://github.com/apache/arrow/pull/37127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] GH-38157: [C++][Parquet] Decode: PlainBooleanDecoder batch decode [arrow]

2023-12-13 Thread via GitHub
mapleFU closed pull request #38158: GH-38157: [C++][Parquet] Decode: PlainBooleanDecoder batch decode URL: https://github.com/apache/arrow/pull/38158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-37606: [C++][Parquet] Support user memory pool in Dataset Parquet writer [arrow]

2023-12-13 Thread via GitHub
mapleFU commented on PR #37607: URL: https://github.com/apache/arrow/pull/37607#issuecomment-1855143964 I'll later draft a new patch for this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] GH-37606: [C++][Parquet] Support user memory pool in Dataset Parquet writer [arrow]

2023-12-13 Thread via GitHub
mapleFU closed pull request #37607: GH-37606: [C++][Parquet] Support user memory pool in Dataset Parquet writer URL: https://github.com/apache/arrow/pull/37607 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Minor: Add LakeSoul to the list of Known Users [arrow-datafusion]

2023-12-13 Thread via GitHub
xuchen-plus opened a new pull request, #8536: URL: https://github.com/apache/arrow-datafusion/pull/8536 ## Which issue does this PR close? NA ## Rationale for this change [LakeSoul](https://github.com/lakesoul-io/LakeSoul) is an open source LakeHouse framewor

[PR] GH-39223: [C#] Support IReadOnlyList on remaining scalar types [arrow]

2023-12-13 Thread via GitHub
CurtHagenlocher opened a new pull request, #39224: URL: https://github.com/apache/arrow/pull/39224 ### What changes are included in this PR? Decimal128Array implements IReadOnlyList and IReadOnlyList. Decimal256Array implements IReadOnlyList, IReadOnlyList and IReadOnlyList. Fi

Re: [PR] Reduced SQL calls in GetObjects to two, added prefixing DbName f… [arrow-adbc]

2023-12-13 Thread via GitHub
github-actions[bot] commented on PR #1352: URL: https://github.com/apache/arrow-adbc/pull/1352#issuecomment-1855109689 :warning: Please follow the [Conventional Commits format in CONTRIBUTING.md](https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md) for PR titles. -- This is an

Re: [PR] Reduced SQL calls in GetObjects to one and added prefixing DbNam… [arrow-adbc]

2023-12-13 Thread via GitHub
ryan-syed commented on PR #1351: URL: https://github.com/apache/arrow-adbc/pull/1351#issuecomment-1855109318 Similar to PR: https://github.com/apache/arrow-adbc/pull/1352, except for `populateMetadata`, `prepareDbSchemasSQL`, `prepareTablesSQL`, and `prepareColumnsSQL`. The implement

Re: [PR] fix: Reduced SQL calls in GetObjects to two, added prefixing DbName f… [arrow-adbc]

2023-12-13 Thread via GitHub
ryan-syed commented on PR #1352: URL: https://github.com/apache/arrow-adbc/pull/1352#issuecomment-1855109132 Similar to PR: https://github.com/apache/arrow-adbc/pull/1351, except for `populateMetadata`, `prepareDbSchemasSQL`, `prepareTablesSQL`, and `prepareColumnsSQL`. The implement

[PR] feat: improve string statistics display [arrow-datafusion]

2023-12-13 Thread via GitHub
asimsedhain opened a new pull request, #8535: URL: https://github.com/apache/arrow-datafusion/pull/8535 ## Which issue does this PR close? Closes #8464 ## Rationale for this change ## What changes are included in this PR? Output for the `data_index_

Re: [PR] GH-39214: [Java] Support reproducible build [arrow]

2023-12-13 Thread via GitHub
jbonofre commented on code in PR #39215: URL: https://github.com/apache/arrow/pull/39215#discussion_r1426152169 ## java/format/pom.xml: ## @@ -23,6 +23,10 @@ Arrow Format Generated Java files from the IPC Flatbuffer definitions. + + 2023-12-13T00:00:00Z + Review Co

Re: [PR] GH-39214: [Java] Support reproducible build [arrow]

2023-12-13 Thread via GitHub
jbonofre commented on code in PR #39215: URL: https://github.com/apache/arrow/pull/39215#discussion_r1426151802 ## java/format/pom.xml: ## @@ -23,6 +23,10 @@ Arrow Format Generated Java files from the IPC Flatbuffer definitions. + + 2023-12-13T00:00:00Z + Review Co

Re: [PR] GH-39214: [Java] Support reproducible build [arrow]

2023-12-13 Thread via GitHub
jbonofre commented on code in PR #39215: URL: https://github.com/apache/arrow/pull/39215#discussion_r1426150367 ## java/pom.xml: ## @@ -28,6 +28,7 @@ https://arrow.apache.org/ + 2023-12-13T00:00:00Z Review Comment: It's possible but not recommended: if you do ar

[PR] feat: implement Unary Expr in substrait [arrow-datafusion]

2023-12-13 Thread via GitHub
waynexia opened a new pull request, #8534: URL: https://github.com/apache/arrow-datafusion/pull/8534 ## Which issue does this PR close? Part of #8149 ## Rationale for this change This PR implements several unary-like `Expr`s: - Not - Negative - Is

Re: [PR] GH-39189: [Java] Bump com.h2database:h2 from 1.4.196 to 2.2.224 in /java [arrow]

2023-12-13 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39188: URL: https://github.com/apache/arrow/pull/39188#issuecomment-1855072983 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit dbed728f840bdb84880708dda865ba4c985e95f9. There were no

Re: [PR] GH-38884: [C++] DatasetWriter release rows_in_flight_throttle when allocate writing failed [arrow]

2023-12-13 Thread via GitHub
mapleFU commented on PR #38885: URL: https://github.com/apache/arrow/pull/38885#issuecomment-1855047632 ``` /Users/fuxuwei/workspace/CMakeLibs/arrow/cpp/src/arrow/dataset/dataset_writer_test.cc:115: Failure Value of: _fut.Wait(::arrow::kDefaultAssertFinishesWaitSeconds) Actual: f

Re: [I] [C++] Implement ODBC driver "wrapper" using FlightSQL [arrow]

2023-12-13 Thread via GitHub
kou commented on issue #30622: URL: https://github.com/apache/arrow/issues/30622#issuecomment-1855046812 > We could create a branch on apache/arrow so that PRs do not have to go into the main branch (in case things are unstable)? We can do it but we may not need to do it. Because we w

Re: [PR] GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #38371: URL: https://github.com/apache/arrow/pull/38371#discussion_r1426112980 ## java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowWriter.java: ## @@ -174,6 +178,12 @@ public long bytesWritten() { return out.getCurrentPosition();

Re: [PR] GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on PR #38371: URL: https://github.com/apache/arrow/pull/38371#issuecomment-1855040062 @lidavidm I updated the PR, appreciate another round of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] [C++] Implement ODBC driver "wrapper" using FlightSQL [arrow]

2023-12-13 Thread via GitHub
kou commented on issue #30622: URL: https://github.com/apache/arrow/issues/30622#issuecomment-1855037844 It seems that you already have many changes. Can we break down it to small pieces and proceed step-by-step like we did to implement Google Cloud Storage file system and Azure Blob Stor

Re: [I] [C++] Implement ODBC driver "wrapper" using FlightSQL [arrow]

2023-12-13 Thread via GitHub
wesm commented on issue #30622: URL: https://github.com/apache/arrow/issues/30622#issuecomment-1855036850 We could create a branch on apache/arrow so that PRs do not have to go into the main branch (in case things are unstable)? -- This is an automated message from the Apache Git Service.

Re: [I] [C++] Implement ODBC driver "wrapper" using FlightSQL [arrow]

2023-12-13 Thread via GitHub
kou commented on issue #30622: URL: https://github.com/apache/arrow/issues/30622#issuecomment-1855026489 Could you open a PR to apache/arrow instead of your fork? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] refactor: use ExprBuilder to consume substrait expr and use macro to generate error [arrow-datafusion]

2023-12-13 Thread via GitHub
waynexia merged PR #8515: URL: https://github.com/apache/arrow-datafusion/pull/8515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

Re: [PR] GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #38371: URL: https://github.com/apache/arrow/pull/38371#discussion_r1426107235 ## java/compression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java: ## @@ -18,31 +18,71 @@ package org.apache.arrow.compression;

Re: [PR] GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #38371: URL: https://github.com/apache/arrow/pull/38371#discussion_r1426107235 ## java/compression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java: ## @@ -18,31 +18,71 @@ package org.apache.arrow.compression;

Re: [PR] GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #38371: URL: https://github.com/apache/arrow/pull/38371#discussion_r1426106903 ## java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileWriter.java: ## @@ -76,15 +74,8 @@ public ArrowFileWriter(VectorSchemaRoot root, DictionaryProvider

Re: [I] [C++] Implement ODBC driver "wrapper" using FlightSQL [arrow]

2023-12-13 Thread via GitHub
alinaliBQ commented on issue #30622: URL: https://github.com/apache/arrow/issues/30622#issuecomment-1855017966 Hi @lidavidm, currently our team's implementation is being done inside our own [arrow fork](https://github.com/Bit-Quill/arrow/pulls). I was wondering if you know any Arrow communi

Re: [PR] GH-36924: [Java] support offset/length and filter in scan option [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #36967: URL: https://github.com/apache/arrow/pull/36967#discussion_r1426103189 ## cpp/src/arrow/dataset/file_parquet_test.cc: ## @@ -607,6 +607,29 @@ TEST_P(TestParquetFileFormatScan, PredicatePushdown) { kNumRowGroup

Re: [I] [Python] Rewrite pyarrow.jvm using the C data interface [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on issue #29891: URL: https://github.com/apache/arrow/issues/29891#issuecomment-1855016609 Would it be okay if I work on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-39037: [Java] Remove (Contrib/Experimental) mention in Flight SQL [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on PR #39040: URL: https://github.com/apache/arrow/pull/39040#issuecomment-1855013438 A CI is failing, but it seems unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-39189: [Java] Bump com.h2database:h2 from 1.4.196 to 2.2.224 in /java [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #39188: URL: https://github.com/apache/arrow/pull/39188#discussion_r1426098675 ## java/adapter/jdbc/src/test/resources/h2/test1_all_datatypes_h2.yml: ## @@ -13,59 +13,59 @@ name: 'test1_all_datatypes_h2' create: 'CREATE TABLE table1 (int_field

Re: [PR] GH-38998: [Java] Build memory-core and memory-unsafe as JPMS modules [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #39011: URL: https://github.com/apache/arrow/pull/39011#discussion_r1426097065 ## java/memory/memory-core/src/main/java/org/apache/arrow/memory/util/MemoryUtil.java: ## @@ -142,7 +142,7 @@ public Object run() { // the static fields above g

Re: [PR] GH-38998: [Java] Build memory-core and memory-unsafe as JPMS modules [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #39011: URL: https://github.com/apache/arrow/pull/39011#discussion_r1426097297 ## java/maven/module-info-compiler-maven-plugin/src/main/java/org/apache/arrow/maven/plugins/BaseModuleInfoCompilerPlugin.java: ## @@ -0,0 +1,92 @@ +/* + * Licensed to

Re: [PR] GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]

2023-12-13 Thread via GitHub
vibhatha commented on code in PR #38371: URL: https://github.com/apache/arrow/pull/38371#discussion_r1426095953 ## java/compression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java: ## @@ -18,31 +18,71 @@ package org.apache.arrow.compression;

Re: [PR] reduce clones of LogicalPlan in planner [arrow-datafusion]

2023-12-13 Thread via GitHub
doki23 closed pull request #7775: reduce clones of LogicalPlan in planner URL: https://github.com/apache/arrow-datafusion/pull/7775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] go/adbc/driver/snowflake: Without the DB name the GetObjects call fails [arrow-adbc]

2023-12-13 Thread via GitHub
ryan-syed commented on issue #1332: URL: https://github.com/apache/arrow-adbc/issues/1332#issuecomment-1854991716 > So long as the maintenance burden is not excessive. We should also default to the 'correct' method and only use the 'fast' method if desired by the caller. Yeah it makes

Re: [I] go/adbc/driver/snowflake: Without the DB name the GetObjects call fails [arrow-adbc]

2023-12-13 Thread via GitHub
ryan-syed commented on issue #1332: URL: https://github.com/apache/arrow-adbc/issues/1332#issuecomment-1854990820 Created a draft PR with reduced SQL calls (though this makes 2 and doesn't use cursor): [#1352](https://github.com/apache/arrow-adbc/pull/1352) -- This is an automated

[PR] fix: Reduced SQL calls in GetObjects to two, added prefixing DbName f… [arrow-adbc]

2023-12-13 Thread via GitHub
ryan-syed opened a new pull request, #1352: URL: https://github.com/apache/arrow-adbc/pull/1352 fix: Reduced SQL calls in GetObjects to two, added prefixing DbName for INFORMATION_SCHEMA calls, and removed SQL cursor code Additional changes: * Reduced SQL calls by making only 1 - 2

Re: [PR] GH-37848: [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC v2/LLJIT [arrow]

2023-12-13 Thread via GitHub
niyue commented on PR #39098: URL: https://github.com/apache/arrow/pull/39098#issuecomment-1854951944 > The remained ASAN error is https://github.com/apache/arrow/pull/39098#issuecomment-1852177839 , right? Correct. It seems to me an ASAN issue for certain LLVM versions since it goes

Re: [PR] GH-39163: [C++] Add missing data copy in StreamDecoder::Consume(data) [arrow]

2023-12-13 Thread via GitHub
kou commented on PR #39164: URL: https://github.com/apache/arrow/pull/39164#issuecomment-1854938008 @pitrou @felipecrv Do you have any opinion for this approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] GH-39212: [Release] Retry download binary subprocess in case of OpenSSL error [arrow]

2023-12-13 Thread via GitHub
kou commented on code in PR #39213: URL: https://github.com/apache/arrow/pull/39213#discussion_r1426048355 ## dev/release/download_rc_binaries.py: ## @@ -121,17 +121,28 @@ def _download_url(self, url, dest_path, *, extra_args=None): dest_path, url,

Re: [PR] GH-39212: [Release] Retry download binary subprocess in case of OpenSSL error [arrow]

2023-12-13 Thread via GitHub
kou commented on code in PR #39213: URL: https://github.com/apache/arrow/pull/39213#discussion_r1426048110 ## dev/release/download_rc_binaries.py: ## @@ -121,17 +121,28 @@ def _download_url(self, url, dest_path, *, extra_args=None): dest_path, url,

Re: [PR] GH-37055: [C++] Optimize hash kernels for Dictionary ChunkedArrays [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on PR #38394: URL: https://github.com/apache/arrow/pull/38394#issuecomment-1854905549 > @felipecrv Do you mind having a look at this? Soon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] GH-39210: [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized RowGroup [arrow]

2023-12-13 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39211: URL: https://github.com/apache/arrow/pull/39211#issuecomment-1854904104 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 4142607f61a2e52fddaaee6e82a9e1be1d462cd9. There were no

Re: [I] Optimizer adds CoalesceBatchesExec after Hash Repartition [arrow-datafusion]

2023-12-13 Thread via GitHub
Blajda commented on issue #8523: URL: https://github.com/apache/arrow-datafusion/issues/8523#issuecomment-1854884264 Looking further into it's an issue with the new operator I'm implementing and coalescing just exposes the issue. -- This is an automated message from the Apache Git Servic

Re: [I] Optimizer adds CoalesceBatchesExec after Hash Repartition [arrow-datafusion]

2023-12-13 Thread via GitHub
Blajda closed issue #8523: Optimizer adds CoalesceBatchesExec after Hash Repartition URL: https://github.com/apache/arrow-datafusion/issues/8523 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] [R] Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...): [arrow]

2023-12-13 Thread via GitHub
assignUser commented on issue #39206: URL: https://github.com/apache/arrow/issues/39206#issuecomment-1854873491 I can repro on github actions: https://github.com/assignUser/test-repo-a/actions/runs/7202352843/job/19620310001#step:3:96 -- This is an automated message from the Apache Git

Re: [PR] Add test for DataFrame::write_table [arrow-datafusion]

2023-12-13 Thread via GitHub
comphead commented on code in PR #8531: URL: https://github.com/apache/arrow-datafusion/pull/8531#discussion_r1425998143 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -1938,6 +1938,97 @@ mod tests { Ok(schema) } +#[tokio::test] +as

Re: [PR] GH-39001: Modularize remaining modules [arrow]

2023-12-13 Thread via GitHub
jduo commented on PR #39221: URL: https://github.com/apache/arrow/pull/39221#issuecomment-1854862068 dataset is having issues due to the directory holding native libs in JARs (the arch eg x86_64) is being treated as a package name. flight-core and other Flight modules requires additional co

Re: [PR] GH-39001: Modularize remaining modules [arrow]

2023-12-13 Thread via GitHub
github-actions[bot] commented on PR #39221: URL: https://github.com/apache/arrow/pull/39221#issuecomment-1854851287 :warning: GitHub issue #39001 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-39001: Modularize remaining modules [arrow]

2023-12-13 Thread via GitHub
jduo opened a new pull request, #39221: URL: https://github.com/apache/arrow/pull/39221 ### Rationale for this change Modularize remaining modules outside of memory modules, vector, and format. ### What changes are included in this PR? ### Are these changes tested? Yes, ex

Re: [PR] GH-38998: [Java] Build memory-core and memory-unsafe as JPMS modules [arrow]

2023-12-13 Thread via GitHub
jduo commented on code in PR #39011: URL: https://github.com/apache/arrow/pull/39011#discussion_r1425986472 ## java/memory/memory-core/pom.xml: ## @@ -54,6 +53,30 @@ + + error-prone-jdk11+ Review Comment: This profile is really the "default" build profile

Re: [PR] GH-38998: [Java] Build memory-core and memory-unsafe as JPMS modules [arrow]

2023-12-13 Thread via GitHub
jduo commented on code in PR #39011: URL: https://github.com/apache/arrow/pull/39011#discussion_r1425985991 ## java/pom.xml: ## @@ -870,7 +897,7 @@ org.apache.maven.plugins maven-surefire-plugin - --add-opens=java.base/java.ni

Re: [I] go/adbc/driver/snowflake: Without the DB name the GetObjects call fails [arrow-adbc]

2023-12-13 Thread via GitHub
lidavidm commented on issue #1332: URL: https://github.com/apache/arrow-adbc/issues/1332#issuecomment-1854815946 So long as the maintenance burden is not excessive. We should also default to the 'correct' method and only use the 'fast' method if desired by the caller. -- This is an automa

Re: [PR] Fix regression with Incorrect results when reading parquet files with different schemas and statistics [arrow-datafusion]

2023-12-13 Thread via GitHub
viirya commented on code in PR #8533: URL: https://github.com/apache/arrow-datafusion/pull/8533#discussion_r1425954546 ## datafusion/sqllogictest/test_files/schema_evolution.slt: ## @@ -0,0 +1,122 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] Fix regression with Incorrect results when reading parquet files with different schemas and statistics [arrow-datafusion]

2023-12-13 Thread via GitHub
viirya commented on code in PR #8533: URL: https://github.com/apache/arrow-datafusion/pull/8533#discussion_r1425954546 ## datafusion/sqllogictest/test_files/schema_evolution.slt: ## @@ -0,0 +1,122 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] Fix regression with Incorrect results when reading parquet files with different schemas and statistics [arrow-datafusion]

2023-12-13 Thread via GitHub
viirya commented on code in PR #8533: URL: https://github.com/apache/arrow-datafusion/pull/8533#discussion_r1425954546 ## datafusion/sqllogictest/test_files/schema_evolution.slt: ## @@ -0,0 +1,122 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [I] [Python][Parquet] Parquet deserialization speeds slower on Linux [arrow]

2023-12-13 Thread via GitHub
pitrou commented on issue #38389: URL: https://github.com/apache/arrow/issues/38389#issuecomment-1854798749 That said, ~1 GB/s for uncompressed PLAIN-encoded fixed-width data is still very mediocre. I think this has to with the fact that `pq.read_table` concatenates the row groups together

Re: [PR] Fix regression with Incorrect results when reading parquet files with different schemas and statistics [arrow-datafusion]

2023-12-13 Thread via GitHub
viirya commented on code in PR #8533: URL: https://github.com/apache/arrow-datafusion/pull/8533#discussion_r1425951299 ## datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs: ## @@ -923,6 +944,7 @@ mod tests { let metrics = parquet_file_metrics();

Re: [PR] GH-39134: [Java] Create module info compiler plugin [arrow]

2023-12-13 Thread via GitHub
danepitkin commented on PR #39135: URL: https://github.com/apache/arrow/pull/39135#issuecomment-1854796465 dependabot is trying to update some of the dependencies added here. Should we ignore them? How do we determine the right support matrix for dependencies? https://github.com/apach

Re: [PR] GH-39087 [Go]: Update unsafe slice idioms to avoid header struct puns [arrow]

2023-12-13 Thread via GitHub
zeroshade commented on PR #39187: URL: https://github.com/apache/arrow/pull/39187#issuecomment-1854786383 You can just keep pushing commits to the branch on your fork, that'll update here automatically. You shouldn't have to force-push unless you're doing a rebase from main. When thi

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425940404 ## cpp/src/arrow/filesystem/azurefs_test.cc: ## @@ -377,199 +487,337 @@ class AzureFileSystemTest : public ::testing::Test { strlen(kSubData));

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425877985 ## cpp/src/arrow/filesystem/azurefs_test.cc: ## @@ -377,199 +487,337 @@ class AzureFileSystemTest : public ::testing::Test { strlen(kSubData));

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425936535 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -33,42 +33,85 @@ #include "arrow/util/logging.h" #include "arrow/util/string.h" -namespace arrow { -namespace fs { +

Re: [I] [C++][Python] `RecordBatch.filter()` segfaults if passed a `ChunkedArray` [arrow]

2023-12-13 Thread via GitHub
nph commented on issue #38770: URL: https://github.com/apache/arrow/issues/38770#issuecomment-1854773351 Thanks @jorisvandenbossche for the suggestion - I'll open a separate issue for this and will work on the PR. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425933200 ## cpp/src/arrow/filesystem/azurefs_test.cc: ## @@ -193,51 +265,123 @@ TEST(AzureFileSystem, OptionsCompare) { EXPECT_TRUE(options.Equals(options)); } -class Az

Re: [I] [Python] Table / RecordBatch repr displays the wrong timezone for non-UTC timestamps [arrow]

2023-12-13 Thread via GitHub
nph commented on issue #38629: URL: https://github.com/apache/arrow/issues/38629#issuecomment-1854770596 Thanks @jorisvandenbossche, @AlenkaF. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: volatile expressions should not be target of common subexpt elimination [arrow-datafusion]

2023-12-13 Thread via GitHub
alamb commented on code in PR #8520: URL: https://github.com/apache/arrow-datafusion/pull/8520#discussion_r1425928916 ## datafusion/expr/src/expr.rs: ## @@ -373,6 +373,24 @@ impl ScalarFunctionDefinition { ScalarFunctionDefinition::Name(func_name) => func_name.as_re

Re: [PR] GH-39214: [Java] Support reproducible build [arrow]

2023-12-13 Thread via GitHub
davisusanibar commented on code in PR #39215: URL: https://github.com/apache/arrow/pull/39215#discussion_r1425927853 ## java/format/pom.xml: ## @@ -23,6 +23,10 @@ Arrow Format Generated Java files from the IPC Flatbuffer definitions. + + 2023-12-13T00:00:00Z + Revi

Re: [I] Regression: Incorrect results when reading parquet files with different schemas and statistics [arrow-datafusion]

2023-12-13 Thread via GitHub
alamb commented on issue #8532: URL: https://github.com/apache/arrow-datafusion/issues/8532#issuecomment-1854764511 I have a PR with a proposed fix ready for review: https://github.com/apache/arrow-datafusion/pull/8533 -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-12-13 Thread via GitHub
alamb commented on PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#issuecomment-1854761469 This PR introduced a regression it turns out: https://github.com/apache/arrow-datafusion/pull/8533 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] GH-39214: [Java] Support reproducible build [arrow]

2023-12-13 Thread via GitHub
davisusanibar commented on code in PR #39215: URL: https://github.com/apache/arrow/pull/39215#discussion_r1425925099 ## java/format/pom.xml: ## @@ -23,6 +23,10 @@ Arrow Format Generated Java files from the IPC Flatbuffer definitions. + + 2023-12-13T00:00:00Z + Revi

Re: [PR] Fix regression with Incorrect results when reading parquet files with different schemas and statistics [arrow-datafusion]

2023-12-13 Thread via GitHub
alamb commented on code in PR #8533: URL: https://github.com/apache/arrow-datafusion/pull/8533#discussion_r1425915211 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -468,8 +468,10 @@ impl FileOpener for ParquetOpener { ParquetRecordBatchSt

Re: [PR] GH-39214: [Java] Support reproducible build [arrow]

2023-12-13 Thread via GitHub
davisusanibar commented on code in PR #39215: URL: https://github.com/apache/arrow/pull/39215#discussion_r1425923128 ## java/pom.xml: ## @@ -28,6 +28,7 @@ https://arrow.apache.org/ + 2023-12-13T00:00:00Z Review Comment: The parameter 'project.build.outputTimesta

Re: [I] [Python][Parquet] Parquet deserialization speeds slower on Linux [arrow]

2023-12-13 Thread via GitHub
pitrou commented on issue #38389: URL: https://github.com/apache/arrow/issues/38389#issuecomment-1854756092 I have also tried to regenerate the given file using different compressions and then compared reading performance: ```python >>> !ls -la lineitem-* -rw-rw-r-- 1 antoine antoin

Re: [PR] GH-37848: [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC v2/LLJIT [arrow]

2023-12-13 Thread via GitHub
kou commented on PR #39098: URL: https://github.com/apache/arrow/pull/39098#issuecomment-1854753925 The remained ASAN error is https://github.com/apache/arrow/pull/39098#issuecomment-1852177839 , right? I'll try it too. BTW, how did you generate the benchmark result graph!? It's ve

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425918013 ## cpp/src/arrow/filesystem/azurefs_test.cc: ## @@ -71,56 +72,113 @@ using ::testing::Not; using ::testing::NotNull; namespace Blobs = Azure::Storage::Blobs; -nam

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425917099 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -33,42 +33,85 @@ #include "arrow/util/logging.h" #include "arrow/util/string.h" -namespace arrow { -namespace fs { +

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425917099 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -33,42 +33,85 @@ #include "arrow/util/logging.h" #include "arrow/util/string.h" -namespace arrow { -namespace fs { +

Re: [PR] GH-39087 [Go]: Update unsafe slice idioms to avoid header struct puns [arrow]

2023-12-13 Thread via GitHub
dr2chase commented on PR #39187: URL: https://github.com/apache/arrow/pull/39187#issuecomment-1854747995 For purposes of not screwing up process, do I revise my commit and force-push it back to the branch on my fork, or is there some other better way? -- This is an automated message f

Re: [PR] WIP: [Release] Verify release-14.0.2-rc3 [arrow]

2023-12-13 Thread via GitHub
kou commented on PR #39193: URL: https://github.com/apache/arrow/pull/39193#issuecomment-1854746669 +1 > verify-rc-binaries-apt-linux-amd64 --> I think we have not uploaded ubuntu lunar which is indeed an issue This was my fault. I've uploaded it. Sorry. -- This is an automa

Re: [PR] GH-39212: [Release] Retry download binary subprocess in case of OpenSSL error [arrow]

2023-12-13 Thread via GitHub
raulcd commented on code in PR #39213: URL: https://github.com/apache/arrow/pull/39213#discussion_r1425915565 ## dev/release/download_rc_binaries.py: ## @@ -121,17 +121,28 @@ def _download_url(self, url, dest_path, *, extra_args=None): dest_path, url,

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425915169 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -33,42 +33,85 @@ #include "arrow/util/logging.h" #include "arrow/util/string.h" -namespace arrow { -namespace fs { +

Re: [PR] GH-39119: [C++] Refactor the Azure FS tests [arrow]

2023-12-13 Thread via GitHub
felipecrv commented on code in PR #39207: URL: https://github.com/apache/arrow/pull/39207#discussion_r1425914623 ## cpp/src/arrow/filesystem/azurefs_test.cc: ## @@ -377,199 +487,337 @@ class AzureFileSystemTest : public ::testing::Test { strlen(kSubData));

[PR] Fix regression with Incorrect results when reading parquet files with different schemas and statistics [arrow-datafusion]

2023-12-13 Thread via GitHub
alamb opened a new pull request, #8533: URL: https://github.com/apache/arrow-datafusion/pull/8533 ## Which issue does this PR close? Fixes https://github.com/apache/arrow-datafusion/issues/8532 ## Rationale for this change https://github.com/apache/arrow-datafusion/pull/8

  1   2   3   4   >