[PR] fix(arrow-json)!: include null fields in schema inference with a type of Null [arrow-rs]

2023-10-04 Thread via GitHub
kskalski opened a new pull request, #4894: URL: https://github.com/apache/arrow-rs/pull/4894 # Which issue does this PR close? Closes #4814. # Rationale for this change Preserves fields with unknown type in inferred schema (see more discussion in the issue) # Wha

Re: [I] Arrow Flight SQL / ADBC datasource [arrow-datafusion]

2023-10-04 Thread via GitHub
backkem commented on issue #7731: URL: https://github.com/apache/arrow-datafusion/issues/7731#issuecomment-1748156188 Yes, I don't think it needs to be enabled by default. I was looking into the adbc direction but the rust port seems less fleshed out than the C and Go ones so far. I

Re: [PR] GH-37199: [C++] Expose a span converter for primitive concrete arrays [arrow]

2023-10-04 Thread via GitHub
github-actions[bot] commented on PR #38027: URL: https://github.com/apache/arrow/pull/38027#issuecomment-1748147297 :warning: GitHub issue #37199 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-37199: [C++] Expose a span converter for primitive concrete arrays [arrow]

2023-10-04 Thread via GitHub
jsjtxietian opened a new pull request, #38027: URL: https://github.com/apache/arrow/pull/38027 ### Rationale for this change Just like we have PrimitiveArray::raw_values, we could also have a span converter for easier consumption. ### What changes are included in this PR?

Re: [PR] MINOR: [C++][Parquet] Fix segfault getting compression level for a Parquet column [arrow]

2023-10-04 Thread via GitHub
mapleFU commented on PR #38025: URL: https://github.com/apache/arrow/pull/38025#issuecomment-1748080758 Would you mind create an issue for this? I guess this is not a "minor" fix for arrow github manangement -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [GH-37751] [C++][Gandiva] Avoid registering exported functions multiple times in gandiva [arrow]

2023-10-04 Thread via GitHub
niyue commented on code in PR #37752: URL: https://github.com/apache/arrow/pull/37752#discussion_r1346798834 ## cpp/src/gandiva/exported_funcs_registry_test.cc: ## @@ -0,0 +1,28 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
github-actions[bot] commented on PR #37684: URL: https://github.com/apache/arrow/pull/37684#issuecomment-1748045976 Revision: eca2134c73063c6ae88d6c7f7a534b6ef14d6295 Submitted crossbow builds: [ursacomputing/crossbow @ actions-09ced6cc0b](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
kou commented on PR #37684: URL: https://github.com/apache/arrow/pull/37684#issuecomment-1748044202 @github-actions crossbow submit r-binary-packages -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] GH-37834: [Gandiva] Migrate to new LLVM PassManager API [arrow]

2023-10-04 Thread via GitHub
niyue commented on PR #37867: URL: https://github.com/apache/arrow/pull/37867#issuecomment-1748040639 @kou thanks so much for fixing the remaining issues! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat(glib): add Vala VAPI for GADBC [arrow-adbc]

2023-10-04 Thread via GitHub
kou commented on code in PR #1152: URL: https://github.com/apache/arrow-adbc/pull/1152#discussion_r1346641512 ## glib/example/README.md: ## @@ -0,0 +1,48 @@ + + +# Arrow GLib example + +There are example codes in this directory. + +C example codes exist in this directory. Langua

Re: [PR] Add operator section to user guide [arrow-datafusion]

2023-10-04 Thread via GitHub
ongchi commented on code in PR #7732: URL: https://github.com/apache/arrow-datafusion/pull/7732#discussion_r1346747930 ## docs/source/user-guide/expressions.md: ## @@ -22,60 +22,89 @@ DataFrame methods such as `select` and `filter` accept one or more logical expressions and th

Re: [PR] Add operator section to user guide [arrow-datafusion]

2023-10-04 Thread via GitHub
ongchi commented on code in PR #7732: URL: https://github.com/apache/arrow-datafusion/pull/7732#discussion_r1346742806 ## docs/source/user-guide/expressions.md: ## @@ -22,60 +22,89 @@ DataFrame methods such as `select` and `filter` accept one or more logical expressions and th

Re: [PR] Add operator section to user guide [arrow-datafusion]

2023-10-04 Thread via GitHub
ongchi commented on code in PR #7732: URL: https://github.com/apache/arrow-datafusion/pull/7732#discussion_r1346734730 ## docs/source/user-guide/expressions.md: ## @@ -213,14 +233,14 @@ Unlike to some databases the math functions in Datafusion works the same way as ## Regula

Re: [PR] Add operator section to user guide [arrow-datafusion]

2023-10-04 Thread via GitHub
ongchi commented on code in PR #7732: URL: https://github.com/apache/arrow-datafusion/pull/7732#discussion_r1346731984 ## docs/source/conf.py: ## @@ -118,4 +118,4 @@ myst_heading_anchors = 3 # enable nice rendering of checkboxes for the task lists -myst_enable_extensions = [

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
kou commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1346680441 ## dev/tasks/r/github.packages.yml: ## @@ -56,6 +56,56 @@ jobs: name: r-pkg__src__contrib path: arrow/r/arrow_*.tar.gz + macos-cpp: +name: C++

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
assignUser commented on PR #37684: URL: https://github.com/apache/arrow/pull/37684#issuecomment-1747971250 As all checks pass now it seems that it was a flakey test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1346725275 ## cpp/cmake_modules/snappy.diff: ## @@ -0,0 +1,30 @@ +diff --git a/CMakeLists.txt b/CMakeLists.txt +index c3062e2..d946037 100644 +--- a/CMakeLists.txt b/CMake

Re: [PR] Add operator section to user guide [arrow-datafusion]

2023-10-04 Thread via GitHub
ongchi commented on code in PR #7732: URL: https://github.com/apache/arrow-datafusion/pull/7732#discussion_r1346722075 ## datafusion/expr/src/expr_fn.rs: ## @@ -103,6 +105,50 @@ pub fn or(left: Expr, right: Expr) -> Expr { )) } +// There is a solid implementation elsewhe

Re: [I] [EPIC] Streaming partitioned writes [arrow-datafusion]

2023-10-04 Thread via GitHub
devinjdangelo commented on issue #6569: URL: https://github.com/apache/arrow-datafusion/issues/6569#issuecomment-1747927404 @alamb I made some progress on inserts to sorted tables https://github.com/apache/arrow-datafusion/issues/7354 This also got me thinking about inserts to partit

[I] Allow Inserts to Partitioned Listing Table [arrow-datafusion]

2023-10-04 Thread via GitHub
devinjdangelo opened a new issue, #7744: URL: https://github.com/apache/arrow-datafusion/issues/7744 ### Is your feature request related to a problem or challenge? It is currently unsupported to run an insert into query for a listing table which is partitioned by a column. ###

[PR] fix(r/adbcdrivermanager): Use ADBC_VERSION_1_1_0 to initialize drivers internally [arrow-adbc]

2023-10-04 Thread via GitHub
paleolimbot opened a new pull request, #1163: URL: https://github.com/apache/arrow-adbc/pull/1163 The existing code should have worked (and probably still did), but I'd like to eliminate any complexity arising from managing multiple driver versions for the purposes of eliminating potential

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
kou commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1346676810 ## cpp/cmake_modules/snappy.diff: ## @@ -0,0 +1,30 @@ +diff --git a/CMakeLists.txt b/CMakeLists.txt +index c3062e2..d946037 100644 +--- a/CMakeLists.txt b/CMakeLists.t

[PR] Support InsertInto Sorted ListingTable [arrow-datafusion]

2023-10-04 Thread via GitHub
devinjdangelo opened a new pull request, #7743: URL: https://github.com/apache/arrow-datafusion/pull/7743 ## Which issue does this PR close? Closes #7354 ## Rationale for this change See issue ## What changes are included in this PR? Allows specifying a `re

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
paleolimbot commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1346662583 ## r/configure: ## @@ -175,12 +165,6 @@ find_arrow () { # 2. Use pkg-config to find arrow on the system _LIBARROW_FOUND="`${PKG_CONFIG} --variable=prefix

Re: [PR] Expand SHOW ALL stmt to show settings description [arrow-datafusion]

2023-10-04 Thread via GitHub
comphead commented on PR #7735: URL: https://github.com/apache/arrow-datafusion/pull/7735#issuecomment-1747868706 > > Having another verbose pair makes much more sense for me. > > I agree Added `SHOW ALL VERBOSE`, `SHOW param VERBOSE` support. User doc updated -- This is

Re: [PR] GH-38015: [MATLAB] Add `arrow.buffer.Buffer` class to the MATLAB Interface [arrow]

2023-10-04 Thread via GitHub
kou commented on code in PR #38020: URL: https://github.com/apache/arrow/pull/38020#discussion_r1346638994 ## matlab/test/arrow/buffer/tBuffer.m: ## @@ -0,0 +1,206 @@ +%TBUFFER Unit tests for arrow.buffer.Buffer + +% Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] GH-38015: [MATLAB] Add `arrow.buffer.Buffer` class to the MATLAB Interface [arrow]

2023-10-04 Thread via GitHub
sgilmore10 commented on code in PR #38020: URL: https://github.com/apache/arrow/pull/38020#discussion_r1346621986 ## matlab/test/arrow/buffer/tBuffer.m: ## @@ -0,0 +1,206 @@ +%TBUFFER Unit tests for arrow.buffer.Buffer + +% Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Implement basic unnest function [arrow-datafusion]

2023-10-04 Thread via GitHub
jayzhan211 commented on code in PR #6796: URL: https://github.com/apache/arrow-datafusion/pull/6796#discussion_r1346620948 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1038,6 +1042,114 @@ impl LogicalPlanBuilder { pub fn unnest_column(self, column: impl Into) -> R

Re: [PR] Parallelize Serialization of Columns within Parquet RowGroups [arrow-datafusion]

2023-10-04 Thread via GitHub
devinjdangelo commented on PR #7655: URL: https://github.com/apache/arrow-datafusion/pull/7655#issuecomment-1747826835 @alamb @tustvold I updated this branch to build against `arrow-rs` master branch while we wait for the next release. Currently getting a compile error, which I opened a PR

Re: [PR] GH-38015: [MATLAB] Add `arrow.buffer.Buffer` class to the MATLAB Interface [arrow]

2023-10-04 Thread via GitHub
sgilmore10 commented on code in PR #38020: URL: https://github.com/apache/arrow/pull/38020#discussion_r1346615131 ## matlab/src/cpp/arrow/matlab/proxy/factory.cc: ## @@ -40,12 +40,14 @@ #include "arrow/matlab/io/feather/proxy/reader.h" #include "arrow/matlab/io/csv/proxy/table

Re: [PR] Implement basic unnest function [arrow-datafusion]

2023-10-04 Thread via GitHub
jayzhan211 commented on code in PR #6796: URL: https://github.com/apache/arrow-datafusion/pull/6796#discussion_r1346614902 ## datafusion/expr/src/expr_schema.rs: ## @@ -88,6 +88,32 @@ impl ExprSchemable for Expr { .collect::>>()?; Ok((fun.re

Re: [PR] Implement basic unnest function [arrow-datafusion]

2023-10-04 Thread via GitHub
jayzhan211 commented on code in PR #6796: URL: https://github.com/apache/arrow-datafusion/pull/6796#discussion_r1346614106 ## datafusion/expr/src/expr.rs: ## @@ -970,6 +987,43 @@ impl Expr { pub fn contains_outer(&self) -> bool { !find_out_reference_exprs(self).is_

[PR] Mark OnCloseRowGroup Send [arrow-rs]

2023-10-04 Thread via GitHub
devinjdangelo opened a new pull request, #4893: URL: https://github.com/apache/arrow-rs/pull/4893 # Which issue does this PR close? Enables https://github.com/apache/arrow-datafusion/pull/7655 # Rationale for this change #4871 rolled back this change, but without it htt

Re: [PR] Implement flatten for MakeArray [arrow-datafusion]

2023-10-04 Thread via GitHub
jayzhan211 commented on PR #7461: URL: https://github.com/apache/arrow-datafusion/pull/7461#issuecomment-1747782727 > This PR appears to be stalled -- what do you think we should do with it @jayzhan211 ? Well, it depends on the review of #6796, since this PR is part of it. -- This

Re: [PR] Add support to create Vala VAPI for GADBC [arrow-adbc]

2023-10-04 Thread via GitHub
esodan commented on PR #1152: URL: https://github.com/apache/arrow-adbc/pull/1152#issuecomment-1747773899 > ⚠️ Please follow the [Conventional Commits format in CONTRIBUTING.md](https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md) for PR titles. Should be done now. -- T

Re: [PR] Add support to create Vala VAPI for GADBC [arrow-adbc]

2023-10-04 Thread via GitHub
esodan commented on PR #1152: URL: https://github.com/apache/arrow-adbc/pull/1152#issuecomment-1747774427 > Could you also add a simple example like https://github.com/apache/arrow/blob/main/c_glib/example/vala/build.vala ? Or do you want me to do it? I've added a C and Vala example

Re: [PR] Add AWS presigned URL support [arrow-rs]

2023-10-04 Thread via GitHub
tustvold commented on PR #4876: URL: https://github.com/apache/arrow-rs/pull/4876#issuecomment-1747750880 I need to sleep on it, but perhaps we just add it to the ObjectStore trait with a default not implemented impl, it wouldn't be the weirdest thing there - I'm looking at you append 😅 -

Re: [PR] feat(c/driver/postgresql,c/driver/sqlite): Implement FOREIGN KEY constraints [arrow-adbc]

2023-10-04 Thread via GitHub
lidavidm merged PR #1099: URL: https://github.com/apache/arrow-adbc/pull/1099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

[PR] MINOR: [C++][Parquet] Fix segfault getting compression level for a Parquet column [arrow]

2023-10-04 Thread via GitHub
adamreeve opened a new pull request, #38025: URL: https://github.com/apache/arrow/pull/38025 ### Rationale for this change After the changes in #35886, getting the compression level for a Parquet column segfaults if the compression level or other options weren't previously set

Re: [PR] GH-35243: [C#] Implement MapType [arrow]

2023-10-04 Thread via GitHub
lidavidm commented on PR #37885: URL: https://github.com/apache/arrow/pull/37885#issuecomment-1747728511 Thanks for the ping - I'll look at this tomorrow! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] Token parameter error - flight-sql-jdbc-driver-13.0.0 [arrow]

2023-10-04 Thread via GitHub
aiguofer commented on issue #37987: URL: https://github.com/apache/arrow/issues/37987#issuecomment-1747721655 Hey @jeremiahOkai . This issue is because some tools try to use an empty username/password if none are provided. This was fixed in this PR https://github.com/apache/arrow/issues/370

Re: [PR] GH-35243: [C#] Implement MapType [arrow]

2023-10-04 Thread via GitHub
davidhcoe commented on PR #37885: URL: https://github.com/apache/arrow/pull/37885#issuecomment-1747671061 @lidavidm - are you able to help at all? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Optimize "ORDER BY + LIMIT" queries for speed / memory with special TopK operator [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on PR #7721: URL: https://github.com/apache/arrow-datafusion/pull/7721#issuecomment-1747670007 FYI @gruuya -- it is finally happening -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Arrow Flight SQL / ADBC datasource [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on issue #7731: URL: https://github.com/apache/arrow-datafusion/issues/7731#issuecomment-1747668435 I think having an (optional) FlightSQL backed remote table sounds like a good idea to me. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] Review use of panics in `datafusion-row` crate [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb closed issue #3317: Review use of panics in `datafusion-row` crate URL: https://github.com/apache/arrow-datafusion/issues/3317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Review use of panics in `datafusion-row` crate [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on issue #3317: URL: https://github.com/apache/arrow-datafusion/issues/3317#issuecomment-1747666484 Indeed in fact I think datafusion-row was removed a while ago, so this ticket is no longer relevant. Thanks @qrilka -- This is an automated message from the Apache Git Ser

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
github-actions[bot] commented on PR #37684: URL: https://github.com/apache/arrow/pull/37684#issuecomment-1747665465 Revision: 99cfe8ae468aaa2a7183fa623e0423bf341fa6a3 Submitted crossbow builds: [ursacomputing/crossbow @ actions-a57a53186a](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
assignUser commented on PR #37684: URL: https://github.com/apache/arrow/pull/37684#issuecomment-1747662526 @github-actions crossbow submit r-binary-packages -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Implement flatten for MakeArray [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on PR #7461: URL: https://github.com/apache/arrow-datafusion/pull/7461#issuecomment-1747659642 This PR appears to be stalled -- what do you think we should do with it @jayzhan211 ? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1346480719 ## cpp/cmake_modules/snappy.diff: ## @@ -0,0 +1,30 @@ +diff --git a/CMakeLists.txt b/CMakeLists.txt +index c3062e2..d946037 100644 +--- a/CMakeLists.txt b/CMake

Re: [I] `bounded_order_preserving_variants` configuration setting is confusingly named [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb closed issue #7722: `bounded_order_preserving_variants` configuration setting is confusingly named URL: https://github.com/apache/arrow-datafusion/issues/7722 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Rename `bounded_order_preserving_variants` config to `prefer_exising_sort` and update docs [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb merged PR #7723: URL: https://github.com/apache/arrow-datafusion/pull/7723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Minor: Add comment on input_schema from AggregateExec [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on PR #7727: URL: https://github.com/apache/arrow-datafusion/pull/7727#issuecomment-1747657892 > I guess it is still possible to remove it, if we embed the data type info in to individual aggregate expression protobuf node. But this will involve a lot of changes just for re

Re: [I] [Python] pyarrow.dataset.dataset does not accept RecordBatchReader as source [arrow]

2023-10-04 Thread via GitHub
kou commented on issue #38012: URL: https://github.com/apache/arrow/issues/38012#issuecomment-1747643140 OK. @sugibuchi Do you want to work on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API [arrow]

2023-10-04 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1346465311 ## python/pyarrow/_dataset_parquet.pyx: ## @@ -56,9 +63,180 @@ from pyarrow._parquet cimport ( cdef Expression _true = Expression._scalar(True) - ctypedef CParq

Re: [PR] [GH-38015]: [MATLAB] Add `arrow.buffer.Buffer` class to the MATLAB Interface [arrow]

2023-10-04 Thread via GitHub
kou commented on code in PR #38020: URL: https://github.com/apache/arrow/pull/38020#discussion_r1346457271 ## matlab/src/cpp/arrow/matlab/proxy/factory.cc: ## @@ -40,12 +40,14 @@ #include "arrow/matlab/io/feather/proxy/reader.h" #include "arrow/matlab/io/csv/proxy/table_writer

Re: [PR] GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API [arrow]

2023-10-04 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1346299229 ## python/pyarrow/_dataset_parquet.pyx: ## @@ -711,6 +889,20 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions): cdef ArrowReaderProperties* arrow_rea

Re: [PR] GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API [arrow]

2023-10-04 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1346456237 ## cpp/src/arrow/dataset/file_parquet.h: ## @@ -226,6 +229,8 @@ class ARROW_DS_EXPORT ParquetFragmentScanOptions : public FragmentScanOptions { /// ScanOptions. A

Re: [PR] GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API [arrow]

2023-10-04 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1346455963 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -67,8 +72,24 @@ parquet::ReaderProperties MakeReaderProperties( properties.disable_buffered_stream(); } pr

Re: [PR] GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API [arrow]

2023-10-04 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1346455441 ## cpp/src/arrow/dataset/file_parquet_test.cc: ## @@ -424,6 +425,34 @@ TEST_F(TestParquetFileSystemDataset, WriteWithEmptyPartitioningSchema) { TestWriteWithEmpty

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1346452562 ## dev/tasks/r/github.packages.yml: ## @@ -56,6 +56,56 @@ jobs: name: r-pkg__src__contrib path: arrow/r/arrow_*.tar.gz + macos-cpp: +nam

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
assignUser commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1346438542 ## r/configure: ## @@ -175,12 +165,6 @@ find_arrow () { # 2. Use pkg-config to find arrow on the system _LIBARROW_FOUND="`${PKG_CONFIG} --variable=prefix -

Re: [I] [Go][Parquet] Writing a Parquet file from a slice of structs [arrow]

2023-10-04 Thread via GitHub
tschaub commented on issue #37807: URL: https://github.com/apache/arrow/issues/37807#issuecomment-1747611641 Looks useful, @chelseajonesr. My only real current use case has been to create Parquet data for tests. I've written a [`test.ParquetFromJSON()` function](https://github.com/p

Re: [I] [Python] pyarrow.dataset.dataset does not accept RecordBatchReader as source [arrow]

2023-10-04 Thread via GitHub
lidavidm commented on issue #38012: URL: https://github.com/apache/arrow/issues/38012#issuecomment-1747608070 Yes, we should update the docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Minor: Add comment on input_schema from AggregateExec [arrow-datafusion]

2023-10-04 Thread via GitHub
viirya commented on PR #7727: URL: https://github.com/apache/arrow-datafusion/pull/7727#issuecomment-1747607020 No worries @alamb . Thanks for approval. 😄 I guess it is still possible to remove it, if we embed the data type info in to individual aggregate expression protobuf node. B

Re: [PR] feat(c/driver/sqlite): enable extension loading [arrow-adbc]

2023-10-04 Thread via GitHub
lidavidm commented on PR #1162: URL: https://github.com/apache/arrow-adbc/pull/1162#issuecomment-1747602608 I changed tacks on this halfway through so this is absolutely not ready, but I'm probably not going to have time to finish this until late October so this is just to remind/shame me i

[PR] feat(c/driver/sqlite): enable extension loading [arrow-adbc]

2023-10-04 Thread via GitHub
lidavidm opened a new pull request, #1162: URL: https://github.com/apache/arrow-adbc/pull/1162 Fixes #938. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] [Python] pyarrow.dataset.dataset does not accept RecordBatchReader as source [arrow]

2023-10-04 Thread via GitHub
kou commented on issue #38012: URL: https://github.com/apache/arrow/issues/38012#issuecomment-1747592880 @lidavidm It seems that this is related to #10070 / #28047. Should we update the document? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] GH-37917: [Parquet] Add OpenAsync for FileSource [arrow]

2023-10-04 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37918: URL: https://github.com/apache/arrow/pull/37918#issuecomment-1747590246 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 02de3c1789460304e958936b78d60f824921c250. There were no

Re: [PR] Minor: Add comment on input_schema from AggregateExec [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on PR #7727: URL: https://github.com/apache/arrow-datafusion/pull/7727#issuecomment-1747589922 > @alamb You can see my previous commits which removed the `input_schema` field. In CI, all unit tests and end-to-end tests can pass but `verify benchmark results` gets failur

Re: [PR] Remove unused `AggregateExec::input_schema` [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on PR #7741: URL: https://github.com/apache/arrow-datafusion/pull/7741#issuecomment-1747588313 > I have tried to remove it at first comments, but encountered verify benchmark results error too. The reason is explained in https://github.com/apache/arrow-datafusion/pull/7727#

Re: [I] [Go][Parquet] Writing a Parquet file from a slice of structs [arrow]

2023-10-04 Thread via GitHub
chelseajonesr commented on issue #37807: URL: https://github.com/apache/arrow/issues/37807#issuecomment-1747581918 @tschaub I have an initial version of this using reflection here, in case this is helpful: https://github.com/chelseajonesr/rfarrow I'm using this for a specific use c

Re: [PR] Fix integration tests [arrow-rs]

2023-10-04 Thread via GitHub
tustvold merged PR #4889: URL: https://github.com/apache/arrow-rs/pull/4889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [I] [C++] exception caused by dataset_writer.cc:587: Check failed: (largest) != (nullptr) [arrow]

2023-10-04 Thread via GitHub
kou commented on issue #38011: URL: https://github.com/apache/arrow/issues/38011#issuecomment-1747569150 Thanks for your report. Do you want to work on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Specialize Thrift Decoding (~40% Faster) (#4891) [arrow-rs]

2023-10-04 Thread via GitHub
alamb commented on PR #4892: URL: https://github.com/apache/arrow-rs/pull/4892#issuecomment-1747566424 I think it is important to note that this is related to making Parquet reading faster - as the connection between TCompactSliceInputProtocol and reading parquet metadata may not be obvious

Re: [PR] GH-37002: [C++][Parquet] Add api to get RecordReader from RowGroupReader [arrow]

2023-10-04 Thread via GitHub
fatemehp commented on code in PR #37003: URL: https://github.com/apache/arrow/pull/37003#discussion_r1346401165 ## cpp/src/parquet/reader_test.cc: ## @@ -502,6 +502,25 @@ TEST_F(TestAllTypesPlain, ColumnSelectionOutOfRange) { ASSERT_THROW(printer2.DebugPrint(ss, columns), Par

Re: [PR] Remove unused `AggregateExec::input_schema` [arrow-datafusion]

2023-10-04 Thread via GitHub
viirya commented on PR #7741: URL: https://github.com/apache/arrow-datafusion/pull/7741#issuecomment-1747563433 I have tried to remove it at first comments, but encountered `verify benchmark results` error too. The reason is explained in https://github.com/apache/arrow-datafusion/pull/7727

Re: [PR] Minor: Add comment on input_schema from AggregateExec [arrow-datafusion]

2023-10-04 Thread via GitHub
viirya commented on PR #7727: URL: https://github.com/apache/arrow-datafusion/pull/7727#issuecomment-1747558884 @alamb You can see my previous commits which removed the `input_schema` field. In CI, all unit tests and end-to-end tests can pass but `verify benchmark results` gets failures wh

Re: [PR] GH-37876: [Format] Add list-view specification to arrow format [arrow]

2023-10-04 Thread via GitHub
zeroshade commented on PR #37877: URL: https://github.com/apache/arrow/pull/37877#issuecomment-1747552085 The vote on the mailing list is officially passed, @bkietz you have an outstanding change requested can you take a look at the updates and update your review accordingly? @pitrou

Re: [PR] Specialize thrift (#4891) [arrow-rs]

2023-10-04 Thread via GitHub
tustvold commented on code in PR #4892: URL: https://github.com/apache/arrow-rs/pull/4892#discussion_r1346383547 ## parquet/src/arrow/async_reader/metadata.rs: ## @@ -95,16 +94,14 @@ impl MetadataLoader { // Did not fetch the entire file metadata in the initial read, ne

Re: [PR] Specialize thrift (#4891) [arrow-rs]

2023-10-04 Thread via GitHub
tustvold commented on PR #4892: URL: https://github.com/apache/arrow-rs/pull/4892#issuecomment-1747533895 With the latest changes ``` open(default) time: [15.697 µs 15.705 µs 15.714 µs] change: [-40.766% -40.682% -40.602%] (p = 0.00 < 0.05)

Re: [PR] feat(c/driver/postgresql,c/driver/sqlite): Implement FOREIGN KEY constraints [arrow-adbc]

2023-10-04 Thread via GitHub
OleMussmann commented on PR #1099: URL: https://github.com/apache/arrow-adbc/pull/1099#issuecomment-1747528680 No worries about the time, I hope you are better now @ywc88 . Thank you for the review and your suggestions. I implemented them in two separate commits. -- This is an automated m

Re: [PR] Add AWS presigned URL support [arrow-rs]

2023-10-04 Thread via GitHub
carols10cents commented on PR #4876: URL: https://github.com/apache/arrow-rs/pull/4876#issuecomment-1747528081 > I think I might be missing some context on why the pre-signed URL functionality needs to be conflated with the ObjectStore trait. TL;DR IOx; I'm happy to make changes to IO

Re: [PR] Minor: Add comment on input_schema from AggregateExec [arrow-datafusion]

2023-10-04 Thread via GitHub
viirya commented on PR #7727: URL: https://github.com/apache/arrow-datafusion/pull/7727#issuecomment-1747504076 I left a comment above to explain it and where/why it is failed in the CI pipeline. On Wed, Oct 4, 2023, 12:19 L. C. Hsieh ***@***.***> wrote: > I am far from a la

Re: [PR] Minor: Add comment on input_schema from AggregateExec [arrow-datafusion]

2023-10-04 Thread via GitHub
viirya commented on PR #7727: URL: https://github.com/apache/arrow-datafusion/pull/7727#issuecomment-1747497267 I am far from a laptop but I tried to remove it at the first and the schema seems necessary when deserializing from protobuf to physical aggregate plan as aggregate expressio

Re: [PR] feat(c/driver/postgresql,c/driver/sqlite): Implement FOREIGN KEY constraints [arrow-adbc]

2023-10-04 Thread via GitHub
OleMussmann commented on code in PR #1099: URL: https://github.com/apache/arrow-adbc/pull/1099#discussion_r1346350878 ## c/validation/adbc_validation.cc: ## @@ -945,6 +945,65 @@ void ConnectionTest::TestMetadataGetObjectsConstraints() { // TODO: can't be done portably (need t

Re: [PR] feat(c/driver/postgresql,c/driver/sqlite): Implement FOREIGN KEY constraints [arrow-adbc]

2023-10-04 Thread via GitHub
OleMussmann commented on code in PR #1099: URL: https://github.com/apache/arrow-adbc/pull/1099#discussion_r1346349805 ## c/validation/adbc_validation.cc: ## @@ -945,6 +945,65 @@ void ConnectionTest::TestMetadataGetObjectsConstraints() { // TODO: can't be done portably (need t

Re: [I] c/driver/common/utils.c: tables are found when they should not [arrow-adbc]

2023-10-04 Thread via GitHub
WillAyd commented on issue #1100: URL: https://github.com/apache/arrow-adbc/issues/1100#issuecomment-1747489456 Ah nice find. I think the upfront strlen check makes sense. Should likely be done for all comparisons in the module -- This is an automated message from the Apache Git Service.

Re: [PR] [GH-38015]: [MATLAB] Add `arrow.buffer.Buffer` class to the MATLAB Interface [arrow]

2023-10-04 Thread via GitHub
github-actions[bot] commented on PR #38020: URL: https://github.com/apache/arrow/pull/38020#issuecomment-1747482544 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] [GH-38015]: [MATLAB] Add `arrow.buffer.Buffer` class to the MATLAB Interface [arrow]

2023-10-04 Thread via GitHub
sgilmore10 opened a new pull request, #38020: URL: https://github.com/apache/arrow/pull/38020 ### Rationale for this change To unblock use cases that are not satisfied by the default Arrow -> MATLAB conversions (i.e. the `toMATLAB()` on `arrow.array.Array`), we would like exp

Re: [PR] fix(go/adbc/driver/snowflake): add useHighPrecision option for decimal vs int64 [arrow-adbc]

2023-10-04 Thread via GitHub
lidavidm merged PR #1160: URL: https://github.com/apache/arrow-adbc/pull/1160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] GH-37923: [R] Move macOS build system to nixlibs.R [arrow]

2023-10-04 Thread via GitHub
paleolimbot commented on code in PR #37684: URL: https://github.com/apache/arrow/pull/37684#discussion_r1346336236 ## r/configure: ## @@ -175,12 +165,6 @@ find_arrow () { # 2. Use pkg-config to find arrow on the system _LIBARROW_FOUND="`${PKG_CONFIG} --variable=prefix

Re: [PR] Remove unused `AggregateExec::input_schema` [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on code in PR #7741: URL: https://github.com/apache/arrow-datafusion/pull/7741#discussion_r1346320857 ## datafusion/core/src/physical_planner.rs: ## @@ -2413,9 +2411,6 @@ mod tests { "SUM(aggregate_test_100.c2)", final_hash_agg.schema().

Re: [PR] Remove unused `AggregateExec::input_schema` [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on code in PR #7741: URL: https://github.com/apache/arrow-datafusion/pull/7741#discussion_r1346319872 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -283,10 +283,6 @@ pub struct AggregateExec { pub input: Arc, /// Schema after the aggregate is

Re: [I] c/driver/common/utils.c: tables are found when they should not [arrow-adbc]

2023-10-04 Thread via GitHub
OleMussmann commented on issue #1100: URL: https://github.com/apache/arrow-adbc/issues/1100#issuecomment-1747452757 After some more testing, I think it works a bit different than stated in the issue text above, my apologies. The bug is still there, the explanation might be slightly differen

Re: [PR] fix(go/adbc/driver/snowflake): add useHighPrecision option for decimal vs int64 [arrow-adbc]

2023-10-04 Thread via GitHub
zeroshade commented on code in PR #1160: URL: https://github.com/apache/arrow-adbc/pull/1160#discussion_r1346305530 ## go/adbc/driver/snowflake/driver.go: ## @@ -67,6 +67,13 @@ const ( // "300ms", "1.5s" or "1m30s". ParseDuration accepts negative values // but th

Re: [PR] Minor: Add comment on input_schema from AggregateExec [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on code in PR #7727: URL: https://github.com/apache/arrow-datafusion/pull/7727#discussion_r1346300290 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -285,7 +285,9 @@ pub struct AggregateExec { schema: SchemaRef, /// Input schema before any agg

Re: [PR] Rename `SessionContext::with_config_rt` to `SessionContext::new_with_config_from_rt`, etc [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb merged PR #7631: URL: https://github.com/apache/arrow-datafusion/pull/7631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Remove unused `AggregateExec::input_schema` [arrow-datafusion]

2023-10-04 Thread via GitHub
alamb commented on code in PR #7741: URL: https://github.com/apache/arrow-datafusion/pull/7741#discussion_r1346278221 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -283,10 +283,6 @@ pub struct AggregateExec { pub input: Arc, /// Schema after the aggregate is

Re: [PR] GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API [arrow]

2023-10-04 Thread via GitHub
tolleybot commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1346299229 ## python/pyarrow/_dataset_parquet.pyx: ## @@ -711,6 +889,20 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions): cdef ArrowReaderProperties* arrow_rea

  1   2   3   4   >