[GitHub] [arrow-rs] Samrose-Ahmed opened a new pull request, #4878: parquet: Read field IDs from Parquet Schema

2023-09-28 Thread via GitHub
Samrose-Ahmed opened a new pull request, #4878: URL: https://github.com/apache/arrow-rs/pull/4878 Currently, field ids are only read from the serialized arrow schema and not the actual parquet file. This PR adds reading the field ids from a Parquet file that doesnt contain the serialized ar

[GitHub] [arrow-rs] Samrose-Ahmed opened a new issue, #4877: parquet: Field Ids are not read from a Parquet file without serialized arrow schema

2023-09-28 Thread via GitHub
Samrose-Ahmed opened a new issue, #4877: URL: https://github.com/apache/arrow-rs/issues/4877 **Describe the bug** If a parquet file that does not have the serialized arrow schema in the metadata (e.g. written by parquet-mr) is read by parquet crate and written back to parquet, th

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #37935: GH-37934: [Doc][Integration] Document C Data Interface testing

2023-09-28 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37935: URL: https://github.com/apache/arrow/pull/37935#issuecomment-1740283062 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit e9730f5971480b942c7394846162c4dfa9145aa9. There were no

[GitHub] [arrow] github-actions[bot] commented on pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
github-actions[bot] commented on PR #37684: URL: https://github.com/apache/arrow/pull/37684#issuecomment-1740265055 Revision: 4c19ac57a04e5733de28819aec74e9558778a10d Submitted crossbow builds: [ursacomputing/crossbow @ actions-aad3415ee8](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] assignUser opened a new pull request, #37946: MINOR: [CI] Fix crossbow badge url

2023-09-28 Thread via GitHub
assignUser opened a new pull request, #37946: URL: https://github.com/apache/arrow/pull/37946 Github has changed the badge url so all crossbow badges where showing up as grey. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow-datafusion] Weijun-H commented on issue #7689: Add Decimal128 support to Ceil and Floor

2023-09-28 Thread via GitHub
Weijun-H commented on issue #7689: URL: https://github.com/apache/arrow-datafusion/issues/7689#issuecomment-1740248126 I think @viirya is correct. ```bash ❯ select ceil(arrow_cast(10, 'Decimal128(38,1)')); +-+ | ceil(Int64(10)) | +--

[GitHub] [arrow-datafusion] maruschin opened a new pull request, #7695: [WIP] Add Union optimization

2023-09-28 Thread via GitHub
maruschin opened a new pull request, #7695: URL: https://github.com/apache/arrow-datafusion/pull/7695 ## Which issue does this PR close? Closes (#7481) Closes #. ## Rationale for this change ## What changes are included in this PR? ##

[GitHub] [arrow] github-actions[bot] commented on pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
github-actions[bot] commented on PR #37684: URL: https://github.com/apache/arrow/pull/37684#issuecomment-1740198663 Revision: 7d696731df3b6b2fb7cbf68210199ded25b7ec24 Submitted crossbow builds: [ursacomputing/crossbow @ actions-8fc5cdd86d](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow-datafusion] jayzhan211 commented on a diff in pull request #6815: Fix timestamp_add_interval_months to pass any date

2023-09-28 Thread via GitHub
jayzhan211 commented on code in PR #6815: URL: https://github.com/apache/arrow-datafusion/pull/6815#discussion_r1340799239 ## datafusion/core/tests/sql/timestamp.rs: ## @@ -579,8 +579,8 @@ async fn timestamp_add_interval_months() -> Result<()> { let t1_naive = chrono::Naive

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340789459 ## r/tools/nixlibs.R: ## @@ -231,11 +259,15 @@ determine_binary_from_stderr <- function(errs) { return(NULL) # Else, determine which other binary will work

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36574: GH-34950: [C++][Parquet] Support encryption for page index

2023-09-28 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #36574: URL: https://github.com/apache/arrow/pull/36574#issuecomment-1740188599 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 853d8491addff3a10fc40950823a2942bb9fbf98. There were no

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340788200 ## r/tools/nixlibs.R: ## @@ -168,18 +176,21 @@ select_binary <- function(os = tolower(Sys.info()[["sysname"]]), } else { # No binary available for arch

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340783765 ## r/tools/nixlibs.R: ## @@ -168,18 +176,21 @@ select_binary <- function(os = tolower(Sys.info()[["sysname"]]), } else { # No binary available for arch

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340782873 ## r/tools/nixlibs-allowlist.txt: ## @@ -2,3 +2,4 @@ ubuntu centos redhat rhel +darwin Review Comment: (I will leave darwin in the allowlist for now, we can a

[GitHub] [arrow] assignUser commented on issue #37685: [R] R package arm64 build fails due to missing brotlienc-static

2023-09-28 Thread via GitHub
assignUser commented on issue #37685: URL: https://github.com/apache/arrow/issues/37685#issuecomment-1740177930 #37923 does not have this problem so I am closing this as not planned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340781606 ## r/configure: ## @@ -65,8 +65,6 @@ PKG_TEST_HEADER="" # Some env vars that control the build (all logical, case insensitive) Review Comment: I opened #37945 f

[GitHub] [arrow-datafusion] matthewgapp commented on issue #7636: CREATE TABLE DDL does not save correct schema, resulting in mismatched plan vs execution (record batch) schema

2023-09-28 Thread via GitHub
matthewgapp commented on issue #7636: URL: https://github.com/apache/arrow-datafusion/issues/7636#issuecomment-1740171545 @alamb, more of a meta thought, but with #4815, but I'm concerned that all of these "bugs" will be go unnoticed (unless they're caught in the DF application logic like

[GitHub] [arrow] felipecrv commented on a diff in pull request #35345: GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats

2023-09-28 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1337605861 ## cpp/src/arrow/integration/json_internal.cc: ## @@ -1492,6 +1506,14 @@ class ArrayReader { return CreateList(type_); } + Status Visit(const ListViewType&

[GitHub] [arrow] bkietz commented on a diff in pull request #37877: GH-37876: [Format] Add list-view specification to arrow format

2023-09-28 Thread via GitHub
bkietz commented on code in PR #37877: URL: https://github.com/apache/arrow/pull/37877#discussion_r1340758467 ## docs/source/format/Columnar.rst: ## @@ -401,11 +406,17 @@ This layout is adapted from TU Munich's `UmbraDB`_. .. _variable-size-list-layout: -Variable-size List

[GitHub] [arrow-datafusion] matthewgapp opened a new pull request, #7694: correctly build nullability information in values exec

2023-09-28 Thread via GitHub
matthewgapp opened a new pull request, #7694: URL: https://github.com/apache/arrow-datafusion/pull/7694 ## Which issue does this PR close? Closes #7693. ## Rationale for this change ## What changes are included in this PR? ## Are these chang

[GitHub] [arrow-datafusion] matthewgapp opened a new issue, #7693: Insert DML results in incorrect record batch schema (doesn't correctly identify values as non-nullable), leading to errors in future

2023-09-28 Thread via GitHub
matthewgapp opened a new issue, #7693: URL: https://github.com/apache/arrow-datafusion/issues/7693 ### Describe the bug When inserting values into a table (mem table) that has a schema where each field is non-nullable and and then performing a window function with a partition by clau

[GitHub] [arrow-datafusion] devinjdangelo opened a new pull request, #7692: Update Default Parquet Write Compression

2023-09-28 Thread via GitHub
devinjdangelo opened a new pull request, #7692: URL: https://github.com/apache/arrow-datafusion/pull/7692 ## Which issue does this PR close? Closes #7691 ## Rationale for this change See issue for discussion ## What changes are included in this PR? Set defa

[GitHub] [arrow-datafusion] devinjdangelo opened a new issue, #7691: Update Default Parquet Write Compression to Sensible Default

2023-09-28 Thread via GitHub
devinjdangelo opened a new issue, #7691: URL: https://github.com/apache/arrow-datafusion/issues/7691 ### Is your feature request related to a problem or challenge? While working on benchmarking for https://github.com/apache/arrow-datafusion/pull/7655 and starting to compare to serial

[GitHub] [arrow-rs] devinjdangelo commented on a diff in pull request #4871: Support Encoding Parquet Columns in Parallel

2023-09-28 Thread via GitHub
devinjdangelo commented on code in PR #4871: URL: https://github.com/apache/arrow-rs/pull/4871#discussion_r1340734930 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -347,92 +349,213 @@ impl PageWriter for ArrowPageWriter { } } -/// Encodes a leaf column to [`ArrowPageWr

[GitHub] [arrow-datafusion] Samrose-Ahmed opened a new issue, #7690: avro_to_arrow: Support in memory apache_avro Value's

2023-09-28 Thread via GitHub
Samrose-Ahmed opened a new issue, #7690: URL: https://github.com/apache/arrow-datafusion/issues/7690 ### Is your feature request related to a problem or challenge? I have Avro values in memory parsed using the apache_avro crate. I would like to convert these and get a RecordBatch.

[GitHub] [arrow-datafusion] Samrose-Ahmed commented on pull request #7663: fix: avro_to_arrow: Handle avro nested nullable struct (union)

2023-09-28 Thread via GitHub
Samrose-Ahmed commented on PR #7663: URL: https://github.com/apache/arrow-datafusion/pull/7663#issuecomment-1740100214 I have added a test to verify this behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #37894: GH-37893: [Java] Move Types.proto in a subfolder

2023-09-28 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37894: URL: https://github.com/apache/arrow/pull/37894#issuecomment-1740089366 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 019d06df56ba3215148465554948a4d93fd9c707. There were no

[GitHub] [arrow] cwegener commented on pull request #37819: GH-37803: [CI][Dev][Python] Release and merge script errors

2023-09-28 Thread via GitHub
cwegener commented on PR #37819: URL: https://github.com/apache/arrow/pull/37819#issuecomment-1740064917 @jorisvandenbossche Thanks so much for the detailed explanation! I was in a rush to fix the breakage and the huge amount of code in setup.py, plus the distractions of there

[GitHub] [arrow] thisisnic commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
thisisnic commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340707525 ## r/tools/nixlibs-allowlist.txt: ## @@ -2,3 +2,4 @@ ubuntu centos redhat rhel +darwin Review Comment: > I guess this is up to @thisisnic as the maintainer, bu

[GitHub] [arrow-adbc] joellubi commented on issue #1107: go/adbc/driver/flightsql: support generic ingest

2023-09-28 Thread via GitHub
joellubi commented on issue #1107: URL: https://github.com/apache/arrow-adbc/issues/1107#issuecomment-1740061763 > just a bytes field, or possibly a bytes field + `map` @lidavidm Could you please elaborate on what you mean by the latter option? Do you mean a bytes field encoding a `ma

[GitHub] [arrow-datafusion] kazuyukitanimura opened a new issue, #7689: Add Decimal128 support Ceil and Floor

2023-09-28 Thread via GitHub
kazuyukitanimura opened a new issue, #7689: URL: https://github.com/apache/arrow-datafusion/issues/7689 ### Is your feature request related to a problem or challenge? `Ceil` and `Floor` in `math_expressions` support only `Float32` and `Float64` currently ### Describe the soluti

[GitHub] [arrow] domoritz merged pull request #37341: GH-21815: [JS] Add support for Duration type

2023-09-28 Thread via GitHub
domoritz merged PR #37341: URL: https://github.com/apache/arrow/pull/37341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow-adbc] WillAyd opened a new pull request, #1130: feat(c/driver/postgresql): Integral COPY writers

2023-09-28 Thread via GitHub
WillAyd opened a new pull request, #1130: URL: https://github.com/apache/arrow-adbc/pull/1130 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

[GitHub] [arrow] kou commented on a diff in pull request #37872: GH-37851: [C++] IPC: ArrayLoader style enhancement

2023-09-28 Thread via GitHub
kou commented on code in PR #37872: URL: https://github.com/apache/arrow/pull/37872#discussion_r1340678049 ## cpp/src/arrow/ipc/reader.cc: ## @@ -243,6 +247,7 @@ class ArrayLoader { } Status GetFieldMetadata(int field_index, ArrayData* out) { +DCHECK_NE(nullptr, out_

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4871: Support Encoding Parquet Columns in Parallel

2023-09-28 Thread via GitHub
tustvold commented on code in PR #4871: URL: https://github.com/apache/arrow-rs/pull/4871#discussion_r1340672619 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -347,92 +349,213 @@ impl PageWriter for ArrowPageWriter { } } -/// Encodes a leaf column to [`ArrowPageWriter`

[GitHub] [arrow] kou merged pull request #37930: GH-37803: [Python][CI] Pin setuptools_scm to fix release verification scripts

2023-09-28 Thread via GitHub
kou merged PR #37930: URL: https://github.com/apache/arrow/pull/37930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4871: Support Encoding Parquet Columns in Parallel

2023-09-28 Thread via GitHub
tustvold commented on code in PR #4871: URL: https://github.com/apache/arrow-rs/pull/4871#discussion_r1340670691 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -347,92 +349,213 @@ impl PageWriter for ArrowPageWriter { } } -/// Encodes a leaf column to [`ArrowPageWriter`

[GitHub] [arrow] thisisnic commented on a diff in pull request #37828: GH-37813: [R] add quoted_na argument to open_delim_dataset()

2023-09-28 Thread via GitHub
thisisnic commented on code in PR #37828: URL: https://github.com/apache/arrow/pull/37828#discussion_r1340668101 ## r/tests/testthat/test-dataset-csv.R: ## @@ -561,6 +561,18 @@ test_that("open_delim_dataset params passed through to open_dataset", { expect_named(ds, c("int"

[GitHub] [arrow] thisisnic commented on a diff in pull request #37828: GH-37813: [R] add quoted_na argument to open_delim_dataset()

2023-09-28 Thread via GitHub
thisisnic commented on code in PR #37828: URL: https://github.com/apache/arrow/pull/37828#discussion_r1340661250 ## r/tests/testthat/test-dataset-csv.R: ## @@ -253,7 +253,7 @@ test_that("readr parse options", { tsv_dir, partitioning = "part", format = "text"

[GitHub] [arrow] pitrou commented on pull request #37933: GH-37936: [CI] Fix integration testing in rc-verify nightly builds

2023-09-28 Thread via GitHub
pitrou commented on PR #37933: URL: https://github.com/apache/arrow/pull/37933#issuecomment-1739991801 > In general, these changes look good. But there are some failures. For example: > > verify-rc-source-integration-linux-almalinux-8-amd64 I noticed that one, but I have no ide

[GitHub] [arrow-datafusion] Dandandan opened a new issue, #7688: Eliminate filter when `pushdown_filters` is enabled

2023-09-28 Thread via GitHub
Dandandan opened a new issue, #7688: URL: https://github.com/apache/arrow-datafusion/issues/7688 ### Is your feature request related to a problem or challenge? When `pushdown_filters` is enabled, DF should be able to eliminate the subsequent filter. When enabling the option for tpc

[GitHub] [arrow-rs] alamb commented on pull request #4871: Support Encoding Parquet Columns in Parallel

2023-09-28 Thread via GitHub
alamb commented on PR #4871: URL: https://github.com/apache/arrow-rs/pull/4871#issuecomment-1739980167 Can we also implement `Debug` for `ArrowColumnWriter`? ``` 93 | #[derive(Debug, Clone)] | - in this derive macro expansion ... 98 | col_writers: Vec

[GitHub] [arrow] kou commented on pull request #37933: GH-37936: [CI] Fix integration testing in rc-verify nightly builds

2023-09-28 Thread via GitHub
kou commented on PR #37933: URL: https://github.com/apache/arrow/pull/37933#issuecomment-1739976908 In general, these changes look good. But there are some failures. For example: verify-rc-source-integration-linux-almalinux-8-amd64 https://github.com/ursacomputing/crossbow/act

[GitHub] [arrow] thisisnic merged pull request #37658: GH-34640: [R] Can't read in partitioning column in CSV datasets when both (non-hive) partition and schema supplied

2023-09-28 Thread via GitHub
thisisnic merged PR #37658: URL: https://github.com/apache/arrow/pull/37658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] alamb commented on a diff in pull request #4871: Support Encoding Parquet Columns in Parallel

2023-09-28 Thread via GitHub
alamb commented on code in PR #4871: URL: https://github.com/apache/arrow-rs/pull/4871#discussion_r1340602639 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -347,92 +349,213 @@ impl PageWriter for ArrowPageWriter { } } -/// Encodes a leaf column to [`ArrowPageWriter`] -

[GitHub] [arrow] thisisnic merged pull request #37843: GH-37842: [R] Implement infer_schema.data.frame()

2023-09-28 Thread via GitHub
thisisnic merged PR #37843: URL: https://github.com/apache/arrow/pull/37843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] jduo commented on a diff in pull request #37942: GH-37702: [Java] Add vector validation consistent with C++

2023-09-28 Thread via GitHub
jduo commented on code in PR #37942: URL: https://github.com/apache/arrow/pull/37942#discussion_r1340643710 ## java/vector/src/main/java/org/apache/arrow/vector/util/DecimalUtility.java: ## @@ -89,13 +89,29 @@ public static byte[] getByteArrayFromArrowBuf(ArrowBuf bytebuf, int

[GitHub] [arrow-rs] carols10cents opened a new pull request, #4876: WIP: Adding AWS presigned URL support

2023-09-28 Thread via GitHub
carols10cents opened a new pull request, #4876: URL: https://github.com/apache/arrow-rs/pull/4876 NOTE: This is currently a work in progress, and I'd love general feedback at this point to know that I'm going in the right direction! I think there's more refactoring I could do and tests I co

[GitHub] [arrow-datafusion] westonpace opened a new issue, #7687: Substrait: Support expression serialization

2023-09-28 Thread via GitHub
westonpace opened a new issue, #7687: URL: https://github.com/apache/arrow-datafusion/issues/7687 ### Is your feature request related to a problem or challenge? The goal is to allow expressions (not plans and, to start with, scalar expressions) to be passed between different libraries

[GitHub] [arrow] lidavidm commented on pull request #37915: GH-36994: [Java][CI] Enable support for JDK21

2023-09-28 Thread via GitHub
lidavidm commented on PR #37915: URL: https://github.com/apache/arrow/pull/37915#issuecomment-1739938247 I kicked the Java jobs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [arrow] davisusanibar commented on pull request #37915: GH-36994: [Java][CI] Enable support for JDK21

2023-09-28 Thread via GitHub
davisusanibar commented on PR #37915: URL: https://github.com/apache/arrow/pull/37915#issuecomment-1739933281 > It seems minus that flaky test, things work on Java 21? I tested it locally without problems and also on my local PR at https://github.com/davisusanibar/arrow/pull/8. The cu

[GitHub] [arrow-datafusion] theelderbeever opened a new issue, #7686: NDJsonExec doesn't properly apply predicates on partitioned tables.

2023-09-28 Thread via GitHub
theelderbeever opened a new issue, #7686: URL: https://github.com/apache/arrow-datafusion/issues/7686 ### Describe the bug Performing a SQL query against a NDJson with partition columns will fail when filtering on any of the partition columns with the following error. In this case my

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #37913: GH-37864: [Java] Remove unnecessary throws from OrcReader

2023-09-28 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37913: URL: https://github.com/apache/arrow/pull/37913#issuecomment-1739911160 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 26667340f2e72c84107c9be28e68aa88dcb064ff. There were no

[GitHub] [arrow] emarx commented on issue #37729: Inner joins are incorrect

2023-09-28 Thread via GitHub
emarx commented on issue #37729: URL: https://github.com/apache/arrow/issues/37729#issuecomment-1739910151 Hi all! Following up here -- this is a very significant bug. We've stopped using inner joins (using left joins and drop nulls instead) as inner joins return incorrect rows. Recommend a

[GitHub] [arrow-datafusion] Dandandan merged pull request #7670: Don't add filters to projection in TableScan

2023-09-28 Thread via GitHub
Dandandan merged PR #7670: URL: https://github.com/apache/arrow-datafusion/pull/7670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] Dandandan closed issue #7683: Filters of TableScan are added to projection when not needed

2023-09-28 Thread via GitHub
Dandandan closed issue #7683: Filters of TableScan are added to projection when not needed URL: https://github.com/apache/arrow-datafusion/issues/7683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] etseidl commented on a diff in pull request #37940: GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED

2023-09-28 Thread via GitHub
etseidl commented on code in PR #37940: URL: https://github.com/apache/arrow/pull/37940#discussion_r1340536156 ## cpp/src/parquet/encoding.cc: ## @@ -2183,15 +2185,21 @@ class DeltaBitPackEncoder : public EncoderImpl, virtual public TypedEncoder(static_cast(static_cast(-1))); +

[GitHub] [arrow] danepitkin commented on issue #37914: [Java][CI]: Enable support for JDK21

2023-09-28 Thread via GitHub
danepitkin commented on issue #37914: URL: https://github.com/apache/arrow/issues/37914#issuecomment-1739908018 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [arrow] lidavidm commented on pull request #37915: GH-36994: [Java][CI] Enable support for JDK21

2023-09-28 Thread via GitHub
lidavidm commented on PR #37915: URL: https://github.com/apache/arrow/pull/37915#issuecomment-1739904790 It seems minus that flaky test, things work on Java 21? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow] davisusanibar commented on pull request #37915: GH-36994: [Java][CI] Enable support for JDK21

2023-09-28 Thread via GitHub
davisusanibar commented on PR #37915: URL: https://github.com/apache/arrow/pull/37915#issuecomment-1739903304 FYI Consider: https://adoptium.net/blog/2023/09/temurin21-delay/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] davisusanibar commented on issue #37914: [Java][CI]: Enable support for JDK21

2023-09-28 Thread via GitHub
davisusanibar commented on issue #37914: URL: https://github.com/apache/arrow/issues/37914#issuecomment-1739901510 Hi @danepitkin sure, let me update the title of the PR related. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] github-actions[bot] commented on pull request #37915: GH-36994: [Java][CI] Enable support for JDK21

2023-09-28 Thread via GitHub
github-actions[bot] commented on PR #37915: URL: https://github.com/apache/arrow/pull/37915#issuecomment-1739901344 :warning: GitHub issue #36994 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] lidavidm commented on a diff in pull request #37942: GH-37702: [Java] Add vector validation consistent with C++

2023-09-28 Thread via GitHub
lidavidm commented on code in PR #37942: URL: https://github.com/apache/arrow/pull/37942#discussion_r1340586618 ## java/vector/src/main/java/org/apache/arrow/vector/util/DecimalUtility.java: ## @@ -89,13 +89,29 @@ public static byte[] getByteArrayFromArrowBuf(ArrowBuf bytebuf,

[GitHub] [arrow-datafusion] matthewgapp commented on issue #7636: CREATE TABLE DDL does not save correct schema, resulting in mismatched plan vs execution (record batch) schema

2023-09-28 Thread via GitHub
matthewgapp commented on issue #7636: URL: https://github.com/apache/arrow-datafusion/issues/7636#issuecomment-1739864817 Thanks @xhwhis, this seems like a separate bug (one whose root cause is because the values exec sets the schema for [all of its columns as nullable here](https://github

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340560188 ## r/tools/nixlibs.R: ## @@ -168,18 +176,21 @@ select_binary <- function(os = tolower(Sys.info()[["sysname"]]), } else { # No binary available for arch

[GitHub] [arrow] nealrichardson commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
nealrichardson commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340549104 ## r/tools/test-nixlibs.R: ## @@ -21,8 +21,8 @@ # Flag so that we just load the functions and don't evaluate them like we do # when called from configure.R TE

[GitHub] [arrow] nealrichardson commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
nealrichardson commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340547561 ## r/tools/nixlibs.R: ## @@ -168,18 +176,21 @@ select_binary <- function(os = tolower(Sys.info()[["sysname"]]), } else { # No binary available for arch

[GitHub] [arrow] nealrichardson commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
nealrichardson commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340547561 ## r/tools/nixlibs.R: ## @@ -168,18 +176,21 @@ select_binary <- function(os = tolower(Sys.info()[["sysname"]]), } else { # No binary available for arch

[GitHub] [arrow] assignUser commented on issue #37941: [R][Release] Add checksum verification to pre-compiled binaries

2023-09-28 Thread via GitHub
assignUser commented on issue #37941: URL: https://github.com/apache/arrow/issues/37941#issuecomment-1739835682 I have marked this as a blocker as we want this in place for the CRAN release (the autobrew build also checks checksums so we want parity with that). -- This is an automated mes

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340543340 ## r/tools/nixlibs-allowlist.txt: ## @@ -2,3 +2,4 @@ ubuntu centos redhat rhel +darwin Review Comment: > Correct; do you want to do that here or in a followup

[GitHub] [arrow] jduo commented on pull request #37942: GH-37702: [Java] Add vector validation consistent with C++

2023-09-28 Thread via GitHub
jduo commented on PR #37942: URL: https://github.com/apache/arrow/pull/37942#issuecomment-1739830330 The NullVector and FixedSizeBinaryVector checks may not really be valuable. It doesn't look like it's possible to get these vectors in a state where these checks can fail. -- This is an a

[GitHub] [arrow] jduo opened a new pull request, #37942: GH-37702: [Java] Add vector validation consistent with C++

2023-09-28 Thread via GitHub
jduo opened a new pull request, #37942: URL: https://github.com/apache/arrow/pull/37942 ### Rationale for this change Make vector validation code more consistent with C++. Add missing checks and have the entry point be the same so that the code is easier to read/write when working with

[GitHub] [arrow] etseidl commented on a diff in pull request #37940: GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED

2023-09-28 Thread via GitHub
etseidl commented on code in PR #37940: URL: https://github.com/apache/arrow/pull/37940#discussion_r1340536156 ## cpp/src/parquet/encoding.cc: ## @@ -2183,15 +2185,21 @@ class DeltaBitPackEncoder : public EncoderImpl, virtual public TypedEncoder(static_cast(static_cast(-1))); +

[GitHub] [arrow] nealrichardson commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
nealrichardson commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340536025 ## r/tools/nixlibs-allowlist.txt: ## @@ -2,3 +2,4 @@ ubuntu centos redhat rhel +darwin Review Comment: > I understand the concerns and would agree with @n

[GitHub] [arrow-rs] alamb commented on a diff in pull request #4859: Enable External ArrowColumnWriter Access

2023-09-28 Thread via GitHub
alamb commented on code in PR #4859: URL: https://github.com/apache/arrow-rs/pull/4859#discussion_r1340534428 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -347,13 +349,22 @@ impl PageWriter for ArrowPageWriter { } } -/// Encodes a leaf column to [`ArrowPageWriter`] -e

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #7670: Don't add filters to projection in TableScan

2023-09-28 Thread via GitHub
alamb commented on code in PR #7670: URL: https://github.com/apache/arrow-datafusion/pull/7670#discussion_r1340532103 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -2759,8 +2705,7 @@ Projection: a, b // For right anti, filter of the left side can be pushed down

[GitHub] [arrow] rok commented on a diff in pull request #37940: GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED

2023-09-28 Thread via GitHub
rok commented on code in PR #37940: URL: https://github.com/apache/arrow/pull/37940#discussion_r1340529289 ## cpp/src/parquet/encoding.cc: ## @@ -2183,15 +2185,21 @@ class DeltaBitPackEncoder : public EncoderImpl, virtual public TypedEncoder(static_cast(static_cast(-1))); +

[GitHub] [arrow-datafusion] alamb opened a new pull request, #7685: Minor: Improve `TableProviderFilterPushDown` docs

2023-09-28 Thread via GitHub
alamb opened a new pull request, #7685: URL: https://github.com/apache/arrow-datafusion/pull/7685 ## Which issue does this PR close? Related to https://github.com/apache/arrow-datafusion/pull/7680 ## Rationale for this change While reviewing https://github.com/apache/arrow-d

[GitHub] [arrow] tolleybot commented on a diff in pull request #34616: GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API

2023-09-28 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1340529141 ## python/pyarrow/_dataset_parquet.pyx: ## @@ -78,7 +239,8 @@ cdef class ParquetFileFormat(FileFormat): CParquetFileFormat* parquet_format def __init_

[GitHub] [arrow] tolleybot commented on a diff in pull request #34616: GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API

2023-09-28 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1340525553 ## python/pyarrow/_dataset_parquet.pyx: ## @@ -711,6 +889,20 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions): cdef ArrowReaderProperties* arrow_rea

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #7680: docs: add section on supports_filters_pushdown

2023-09-28 Thread via GitHub
alamb commented on code in PR #7680: URL: https://github.com/apache/arrow-datafusion/pull/7680#discussion_r1340522395 ## docs/source/library-user-guide/custom-table-providers.md: ## @@ -121,6 +121,22 @@ impl TableProvider for CustomDataSource { With this, and the implementati

[GitHub] [arrow] tolleybot commented on a diff in pull request #34616: GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API

2023-09-28 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1340518972 ## python/pyarrow/_dataset_parquet.pyx: ## @@ -56,9 +63,163 @@ from pyarrow._parquet cimport ( cdef Expression _true = Expression._scalar(True) - ctypedef CParq

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #7678: Improve cache usage in CI

2023-09-28 Thread via GitHub
alamb commented on code in PR #7678: URL: https://github.com/apache/arrow-datafusion/pull/7678#discussion_r1340516995 ## .github/workflows/rust.yml: ## @@ -47,18 +47,24 @@ jobs: image: amd64/rust steps: - uses: actions/checkout@v4 - - name: Cache Cargo -

[GitHub] [arrow] assignUser commented on pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on PR #37684: URL: https://github.com/apache/arrow/pull/37684#issuecomment-1739798561 > add a checksum check Agreed: https://github.com/apache/arrow/issues/37941 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] github-actions[bot] commented on pull request #37940: GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED

2023-09-28 Thread via GitHub
github-actions[bot] commented on PR #37940: URL: https://github.com/apache/arrow/pull/37940#issuecomment-1739796467 :warning: GitHub issue #37939 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] etseidl opened a new pull request, #37940: GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED

2023-09-28 Thread via GitHub
etseidl opened a new pull request, #37940: URL: https://github.com/apache/arrow/pull/37940 Closes #37939. ### What changes are included in this PR? This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is pe

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340511985 ## r/tools/test-nixlibs.R: ## @@ -21,8 +21,8 @@ # Flag so that we just load the functions and don't evaluate them like we do # when called from configure.R TESTIN

[GitHub] [arrow-datafusion] alamb commented on pull request #7682: minor: revert parsing precedence between Aggr and UDAF

2023-09-28 Thread via GitHub
alamb commented on PR #7682: URL: https://github.com/apache/arrow-datafusion/pull/7682#issuecomment-1739787702 I agree it makes sense to allow UDAF to shadow built in aggregate functions (so that users can redefine the meaning of aggregates if they wanted) Is there any way we can add

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340506133 ## r/tools/nixlibs.R: ## @@ -191,33 +202,50 @@ test_for_curl_and_openssl <- " #if OPENSSL_VERSION_NUMBER >= 0x3000L #error Using OpenSSL version 3 #endif -" +

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #7671: Enhance Enforce Dist capabilities to fix, sub optimal bad plans

2023-09-28 Thread via GitHub
alamb commented on code in PR #7671: URL: https://github.com/apache/arrow-datafusion/pull/7671#discussion_r1340487999 ## datafusion/core/src/physical_optimizer/global_requirements.rs: ## @@ -0,0 +1,268 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow-datafusion] alamb commented on issue #424: Design how to respect output stream ordering

2023-09-28 Thread via GitHub
alamb commented on issue #424: URL: https://github.com/apache/arrow-datafusion/issues/424#issuecomment-1739783931 Related PR: https://github.com/apache/arrow-datafusion/pull/7671#pullrequestreview-1649409525 -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340505189 ## r/tools/nixlibs.R: ## @@ -191,33 +202,50 @@ test_for_curl_and_openssl <- " #if OPENSSL_VERSION_NUMBER >= 0x3000L #error Using OpenSSL version 3 #endif -" +

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340503824 ## r/tools/nixlibs.R: ## @@ -168,18 +176,21 @@ select_binary <- function(os = tolower(Sys.info()[["sysname"]]), } else { # No binary available for arch

[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

2023-09-28 Thread via GitHub
zeroshade commented on code in PR #37785: URL: https://github.com/apache/arrow/pull/37785#discussion_r1340501129 ## go/parquet/internal/bmi/bmi_arm64.go: ## @@ -14,44 +14,51 @@ // See the License for the specific language governing permissions and // limitations under the Lice

[GitHub] [arrow] assignUser commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340502532 ## r/tools/nixlibs-allowlist.txt: ## @@ -2,3 +2,4 @@ ubuntu centos redhat rhel +darwin Review Comment: I understand the concerns and would agree with @nealric

[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

2023-09-28 Thread via GitHub
zeroshade commented on code in PR #37785: URL: https://github.com/apache/arrow/pull/37785#discussion_r1340501129 ## go/parquet/internal/bmi/bmi_arm64.go: ## @@ -14,44 +14,51 @@ // See the License for the specific language governing permissions and // limitations under the Lice

[GitHub] [arrow-cookbook] pronzato commented on pull request #316: [Java] Document how to convert JDBC Adapter result into a Parquet file

2023-09-28 Thread via GitHub
pronzato commented on PR #316: URL: https://github.com/apache/arrow-cookbook/pull/316#issuecomment-1739766832 Hi David, When I try to run JDBCReader I get URI has empty scheme java.lang.RuntimeException: URI has empty scheme: '/tmp at org.apache.arrow.d

[GitHub] [arrow] danepitkin commented on issue #37914: [Java][CI]: Enable support for JDK21

2023-09-28 Thread via GitHub
danepitkin commented on issue #37914: URL: https://github.com/apache/arrow/issues/37914#issuecomment-1739753192 Hey @davisusanibar , is it okay if we close this issue in favor of https://github.com/apache/arrow/issues/36994? -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow] jonkeane commented on a diff in pull request #37684: GH-37923: [R] Move macOS build system to nixlibs.R

2023-09-28 Thread via GitHub
jonkeane commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1340476934 ## r/tools/nixlibs-allowlist.txt: ## @@ -2,3 +2,4 @@ ubuntu centos redhat rhel +darwin Review Comment: FWIW, I agree with Neal's lean towards (2) for similar re

[GitHub] [arrow] amoeba commented on issue #37495: [R] Passing a large dataset to duckdb and back results in memory being used and not freed

2023-09-28 Thread via GitHub
amoeba commented on issue #37495: URL: https://github.com/apache/arrow/issues/37495#issuecomment-1739743649 I know we've see memory issues with PyArrow code like and I suspect the R package uses a similar code path: ```python table_ds = ds.dataset([path_to_parquet_file], filesyste

  1   2   3   >