[GitHub] [arrow-datafusion] MichaelScofield commented on issue #5529: Failed to execute sql with subquery

2023-03-09 Thread via GitHub
MichaelScofield commented on issue #5529: URL: https://github.com/apache/arrow-datafusion/issues/5529#issuecomment-1463402334 I've created the fix in https://github.com/apache/arrow-datafusion/pull/5542, PTAL @alamb -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow-datafusion] MichaelScofield opened a new pull request, #5542: fix: failed to execute sql with subquery

2023-03-09 Thread via GitHub
MichaelScofield opened a new pull request, #5542: URL: https://github.com/apache/arrow-datafusion/pull/5542 # Which issue does this PR close? Closes #5529 . # Rationale for this change fix: failed to run "where x in ((select ...))" # What changes ar

[GitHub] [arrow-datafusion] Jefffrey commented on issue #5538: Cannot compare two arrays of different types is thrown on a simple two-statement query

2023-03-09 Thread via GitHub
Jefffrey commented on issue #5538: URL: https://github.com/apache/arrow-datafusion/issues/5538#issuecomment-1463394118 Note that the SQL itself actually doesn't work, but the explain for it does (which is what the tpc-ds logical tests are checking). On latest main: ```sql DataFusi

[GitHub] [arrow-datafusion] yahoNanJing opened a new issue, #5541: Introduce ObjectStoreManager trait for the ObjectStoreRegistry to provide polymorphism for get_by_url

2023-03-09 Thread via GitHub
yahoNanJing opened a new issue, #5541: URL: https://github.com/apache/arrow-datafusion/issues/5541 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** **Describe the solution you'd like** **Describe alternatives you'

[GitHub] [arrow] jorisvandenbossche merged pull request #34463: GH-32619: [Python][Docs] Include options for PyArrow build explicitly

2023-03-09 Thread via GitHub
jorisvandenbossche merged PR #34463: URL: https://github.com/apache/arrow/pull/34463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

2023-03-09 Thread via GitHub
yevgenypats commented on code in PR #34454: URL: https://github.com/apache/arrow/pull/34454#discussion_r1132021652 ## go/arrow/internal/testing/types/extension_types.go: ## @@ -18,20 +18,171 @@ package types import ( + "bytes" "encoding/binary" "fmt"

[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

2023-03-09 Thread via GitHub
yevgenypats commented on code in PR #34454: URL: https://github.com/apache/arrow/pull/34454#discussion_r1132020015 ## go/arrow/internal/testing/types/extension_test.go: ## @@ -0,0 +1,63 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-09 Thread via GitHub
Jefffrey commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1132012053 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -203,6 +203,24 @@ impl LogicalPlanBuilder { Self::scan_with_filters(table_name, table_source, p

[GitHub] [arrow-rs] eddyxu opened a new issue, #3837: Concating dictionary array leads to duplicated dict values.

2023-03-09 Thread via GitHub
eddyxu opened a new issue, #3837: URL: https://github.com/apache/arrow-rs/issues/3837 **Describe the bug** I was trying to concatenate a few DictionaryArrays using `arrow_select::concat::concat`. While the each Dictionary Array shares the same value strings. The resulted array ho

[GitHub] [arrow] ursabot commented on pull request #34254: GH-34147: [C++][Parquet] Support crc count and checking on DICTIONARY_PAGE

2023-03-09 Thread via GitHub
ursabot commented on PR #34254: URL: https://github.com/apache/arrow/pull/34254#issuecomment-1463292655 Benchmark runs are scheduled for baseline = bb74cd78cd2d56a5286a2fd2494c79d8ed85d230 and contender = 0ac0f733ff61f2db45cbff54def8768b3ceb8a9d. 0ac0f733ff61f2db45cbff54def8768b3ceb8a9d is

[GitHub] [arrow] AlenkaF commented on a diff in pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

2023-03-09 Thread via GitHub
AlenkaF commented on code in PR #33925: URL: https://github.com/apache/arrow/pull/33925#discussion_r1131946099 ## docs/source/format/CanonicalExtensions.rst: ## @@ -72,4 +72,76 @@ same rules as laid out above, and provide backwards compatibility guarantees. Official List

[GitHub] [arrow-flight-sql-postgresql] kou commented on pull request #28: Add benchmark for integer only data

2023-03-09 Thread via GitHub
kou commented on PR #28: URL: https://github.com/apache/arrow-flight-sql-postgresql/pull/28#issuecomment-1463240883 Thanks for sharing your thought! > Do you have a good idea of the current bottlenecks? I haven't profiled yet but I think that the current shared memory based dat

[GitHub] [arrow] HammadB commented on pull request #15177: GH-15174: [Go][FlightRPC] Expose Flight Server Desc and RegisterFlightService

2023-03-09 Thread via GitHub
HammadB commented on PR #15177: URL: https://github.com/apache/arrow/pull/15177#issuecomment-1463238147 Is there any plan to expose these to python so that you can use arrowflight alongside an existing python grpc server? -- This is an automated message from the Apache Git Service. To res

[GitHub] [arrow-rs] jiacai2050 commented on issue #3827: Cannot access Alibaba Cloud OSS via AmazonS3

2023-03-09 Thread via GitHub
jiacai2050 commented on issue #3827: URL: https://github.com/apache/arrow-rs/issues/3827#issuecomment-1463232232 Thanks, `with_endpoint` works expected. ```rs AmazonS3Builder::new() .with_virtual_hosted_style_request(true) // region is not used when virtual

[GitHub] [arrow] assignUser commented on a diff in pull request #34463: GH-32619: [Python][Docs] Include options for PyArrow build explicitly

2023-03-09 Thread via GitHub
assignUser commented on code in PR #34463: URL: https://github.com/apache/arrow/pull/34463#discussion_r1131929466 ## docs/source/developers/python.rst: ## @@ -586,6 +586,83 @@ Caveats The Plasma component is not supported on Windows. +Relevant components and environment var

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #5345: Refactor DecorrelateWhereExists and add back Distinct if needs

2023-03-09 Thread via GitHub
mingmwang commented on code in PR #5345: URL: https://github.com/apache/arrow-datafusion/pull/5345#discussion_r1131929489 ## datafusion/optimizer/src/decorrelate_where_exists.rs: ## @@ -670,4 +673,76 @@ mod tests { assert_plan_eq(&plan, expected) } + +#[test]

[GitHub] [arrow] chrisirhc commented on issue #34377: [Go] enhancement request to expose AnyValue() on Scalar

2023-03-09 Thread via GitHub
chrisirhc commented on issue #34377: URL: https://github.com/apache/arrow/issues/34377#issuecomment-1463212875 Yep, happy to do so. Thoughts on what to name the method? Does `AnyValue()` work ? I'd assume `Value()` conflicts or might be confusing since `.Value` is used to access typed scal

[GitHub] [arrow] mapleFU commented on issue #34510: Reading FixedSizeList from parquet is slower than reading values into more rows

2023-03-09 Thread via GitHub
mapleFU commented on issue #34510: URL: https://github.com/apache/arrow/issues/34510#issuecomment-1463168911 Thanks, I'll testing this tonight. Currently I guess constructing FixedSizeList may use some space and consuming some time. -- This is an automated message from the Apache Git Serv

[GitHub] [arrow] ursabot commented on pull request #34514: GH-34513: [CI][Python] Remove unused imports from _acero.pyx to fix linting failures

2023-03-09 Thread via GitHub
ursabot commented on PR #34514: URL: https://github.com/apache/arrow/pull/34514#issuecomment-1463143547 Benchmark runs are scheduled for baseline = 2ff4e3a2523bd0c58168d6ca4bcb14f45393ff2b and contender = bb74cd78cd2d56a5286a2fd2494c79d8ed85d230. bb74cd78cd2d56a5286a2fd2494c79d8ed85d230 is

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #5345: Refactor DecorrelateWhereExists and add back Distinct if needs

2023-03-09 Thread via GitHub
mingmwang commented on code in PR #5345: URL: https://github.com/apache/arrow-datafusion/pull/5345#discussion_r1131881136 ## datafusion/optimizer/tests/integration-test.rs: ## @@ -151,8 +151,9 @@ fn where_exists_distinct() -> Result<()> { let plan = test_sql(sql)?; let

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #5345: Refactor DecorrelateWhereExists and add back Distinct if needs

2023-03-09 Thread via GitHub
mingmwang commented on code in PR #5345: URL: https://github.com/apache/arrow-datafusion/pull/5345#discussion_r1131881136 ## datafusion/optimizer/tests/integration-test.rs: ## @@ -151,8 +151,9 @@ fn where_exists_distinct() -> Result<()> { let plan = test_sql(sql)?; let

[GitHub] [arrow] kou commented on issue #34523: [R] Unable to compile R package with GCS on macOS M1

2023-03-09 Thread via GitHub
kou commented on issue #34523: URL: https://github.com/apache/arrow/issues/34523#issuecomment-1463107903 CMake has some features to control include path order. For example, `target_include_directories()` https://cmake.org/cmake/help/latest/command/target_include_directories.html provides `

[GitHub] [arrow] amoeba commented on issue #34523: [R] Unable to compile R package with GCS on macOS M1

2023-03-09 Thread via GitHub
amoeba commented on issue #34523: URL: https://github.com/apache/arrow/issues/34523#issuecomment-1463098831 Interesting @kou, thanks. This is an area I'm not very familiar with but is there a possibility for an improvement to the build system here? -- This is an automated message from the

[GitHub] [arrow] paleolimbot commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

2023-03-09 Thread via GitHub
paleolimbot commented on code in PR #34524: URL: https://github.com/apache/arrow/pull/34524#discussion_r1131865760 ## r/R/filesystem.R: ## @@ -572,6 +572,11 @@ GcsFileSystem$create <- function(anonymous = FALSE, retry_limit_seconds = 15, .. options$retry_limit_seconds <- r

[GitHub] [arrow-datafusion] viirya commented on a diff in pull request #5540: Add necessary features to optimizer

2023-03-09 Thread via GitHub
viirya commented on code in PR #5540: URL: https://github.com/apache/arrow-datafusion/pull/5540#discussion_r1131862153 ## datafusion/optimizer/Cargo.toml: ## @@ -33,16 +33,19 @@ name = "datafusion_optimizer" path = "src/lib.rs" [features] -default = ["unicode_expressions"] -

[GitHub] [arrow-datafusion] viirya commented on pull request #5540: Add necessary features to optimizer

2023-03-09 Thread via GitHub
viirya commented on PR #5540: URL: https://github.com/apache/arrow-datafusion/pull/5540#issuecomment-1463088355 cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] viirya commented on a diff in pull request #5540: Add necessary features to optimizer

2023-03-09 Thread via GitHub
viirya commented on code in PR #5540: URL: https://github.com/apache/arrow-datafusion/pull/5540#discussion_r1131861842 ## datafusion/optimizer/Cargo.toml: ## @@ -33,16 +33,19 @@ name = "datafusion_optimizer" path = "src/lib.rs" [features] -default = ["unicode_expressions"] -

[GitHub] [arrow-datafusion] viirya commented on a diff in pull request #5540: Add necessary features to optimizer

2023-03-09 Thread via GitHub
viirya commented on code in PR #5540: URL: https://github.com/apache/arrow-datafusion/pull/5540#discussion_r1131861166 ## datafusion/core/Cargo.toml: ## @@ -41,21 +41,21 @@ path = "src/lib.rs" # Used to enable the avro format avro = ["apache-avro", "num-traits", "datafusion-co

[GitHub] [arrow-datafusion] viirya opened a new pull request, #5540: Add necessary features to optimizer

2023-03-09 Thread via GitHub
viirya opened a new pull request, #5540: URL: https://github.com/apache/arrow-datafusion/pull/5540 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are these changes tested?

[GitHub] [arrow] kou commented on issue #34523: [R] Unable to compile R package with GCS on macOS M1

2023-03-09 Thread via GitHub
kou commented on issue #34523: URL: https://github.com/apache/arrow/issues/34523#issuecomment-1463069526 Ah, this is caused by mixing include paths. Some dependencies installed by Homebrew add `-I$(brew --prefix)/include` and it may be used before bundled Abseil. It mixes Homebrew Abseil'

[GitHub] [arrow] felipecrv commented on a diff in pull request #34408: GH-34361: [C++] Fix the handling of logical nulls for types without bitmaps like Unions and Run-End Encoded

2023-03-09 Thread via GitHub
felipecrv commented on code in PR #34408: URL: https://github.com/apache/arrow/pull/34408#discussion_r1131836891 ## cpp/src/arrow/array/data.h: ## @@ -363,14 +475,73 @@ struct ARROW_EXPORT ArraySpan { } } - /// \brief Return null count, or compute and set it if it's n

[GitHub] [arrow] amoeba commented on issue #33106: [R] Documentation for json_credentials is misleading

2023-03-09 Thread via GitHub
amoeba commented on issue #33106: URL: https://github.com/apache/arrow/issues/33106#issuecomment-1463050491 Hey @cboettig, I've written up a standalone issue for what you're seeing at https://github.com/apache/arrow/issues/34525 and think we can follow up there. Once https://github.c

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5537: Minor: add the concise way for matching numerics

2023-03-09 Thread via GitHub
comphead commented on code in PR #5537: URL: https://github.com/apache/arrow-datafusion/pull/5537#discussion_r1131816757 ## datafusion/expr/src/type_coercion/aggregates.rs: ## @@ -314,77 +314,45 @@ pub fn sum_return_type(arg_type: &DataType) -> Result { /// function return t

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5537: Minor: add the concise way for matching numerics

2023-03-09 Thread via GitHub
comphead commented on code in PR #5537: URL: https://github.com/apache/arrow-datafusion/pull/5537#discussion_r1131813075 ## datafusion/expr/src/type_coercion/aggregates.rs: ## @@ -314,77 +314,45 @@ pub fn sum_return_type(arg_type: &DataType) -> Result { /// function return t

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #5539: Prepare for 20.0.0 release

2023-03-09 Thread via GitHub
andygrove opened a new pull request, #5539: URL: https://github.com/apache/arrow-datafusion/pull/5539 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are these changes tested?

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5537: Minor: add the concise way for matching numerics

2023-03-09 Thread via GitHub
alamb commented on code in PR #5537: URL: https://github.com/apache/arrow-datafusion/pull/5537#discussion_r1131800410 ## datafusion/expr/src/type_coercion/aggregates.rs: ## @@ -314,77 +314,45 @@ pub fn sum_return_type(arg_type: &DataType) -> Result { /// function return type

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5537: Minor: add the concise way for matching numerics

2023-03-09 Thread via GitHub
comphead commented on code in PR #5537: URL: https://github.com/apache/arrow-datafusion/pull/5537#discussion_r1131797955 ## datafusion/expr/src/type_coercion/aggregates.rs: ## @@ -314,77 +314,45 @@ pub fn sum_return_type(arg_type: &DataType) -> Result { /// function return t

[GitHub] [arrow-adbc] wjones127 commented on pull request #478: feat(rust): define the rust adbc api

2023-03-09 Thread via GitHub
wjones127 commented on PR #478: URL: https://github.com/apache/arrow-adbc/pull/478#issuecomment-1463021509 > It may be useful to return Arrow objects and expose the converter helper? I deemed that a "battle for another day" when working on the R bindings but I suppose you've already written

[GitHub] [arrow] kou commented on a diff in pull request #34482: GH-34481: [CI] Migrate ARM jobs from Travis to self-hosted runners

2023-03-09 Thread via GitHub
kou commented on code in PR #34482: URL: https://github.com/apache/arrow/pull/34482#discussion_r1131780731 ## .github/workflows/go.yml: ## @@ -369,3 +369,41 @@ jobs: - name: Test shell: bash run: ci/scripts/go_test.sh $(pwd) + + linux-arm: +name: AR

[GitHub] [arrow] ursabot commented on pull request #34184: GH-34154: [Python] Add `is_nan` method to Array and Expression

2023-03-09 Thread via GitHub
ursabot commented on PR #34184: URL: https://github.com/apache/arrow/pull/34184#issuecomment-1462993364 Benchmark runs are scheduled for baseline = b679a96d426f4df1a2d15d452f312c968cdfc8f6 and contender = 2ff4e3a2523bd0c58168d6ca4bcb14f45393ff2b. 2ff4e3a2523bd0c58168d6ca4bcb14f45393ff2b is

[GitHub] [arrow-datafusion] l0kr commented on issue #4338: Add serialization for entire LogicalPlans to datafusion-proto

2023-03-09 Thread via GitHub
l0kr commented on issue #4338: URL: https://github.com/apache/arrow-datafusion/issues/4338#issuecomment-1462938156 Let me take a stab at this. @alamb anything changed since this task was created? -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow-rs] iajoiner commented on pull request #3836: Prep for 35.0.0

2023-03-09 Thread via GitHub
iajoiner commented on PR #3836: URL: https://github.com/apache/arrow-rs/pull/3836#issuecomment-1462924357 I'm getting weird connection issues while updating the changelog. Will try again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow-rs] iajoiner opened a new pull request, #3836: Prep for 35.0.0

2023-03-09 Thread via GitHub
iajoiner opened a new pull request, #3836: URL: https://github.com/apache/arrow-rs/pull/3836 # Which issue does this PR close? Closes #3830. # Rationale for this change Biweekly Release # What changes are included in this PR? Update version to 35.0.0

[GitHub] [arrow-datafusion] milevin opened a new issue, #5538: Cannot compare two arrays of different types is thrown on a simple two-statement query

2023-03-09 Thread via GitHub
milevin opened a new issue, #5538: URL: https://github.com/apache/arrow-datafusion/issues/5538 **Problem Statement** This statement works: ``` select CASE 10.5 WHEN 0 THEN null ELSE 10 END as col; ``` This list of two statements don't: ``` create table res as

[GitHub] [arrow] ablack3 commented on issue #33807: Using dplyr::tally with an Arrow FileSystemDataset crashes R

2023-03-09 Thread via GitHub
ablack3 commented on issue #33807: URL: https://github.com/apache/arrow/issues/33807#issuecomment-1462896715 This is still crashing R on my machine. I'm using arrow v11.0.0.2 ``` Sys.setenv(ARROW_USER_SIMD_LEVEL="NONE") library(dplyr) arrow::write_dataset(cars, here::here("car

[GitHub] [arrow-rs] Weijun-H commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

2023-03-09 Thread via GitHub
Weijun-H commented on issue #3821: URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1462877642 Should we also keep the part like,`Int32`, `Int64` in arrow-rs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-rs] Weijun-H commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

2023-03-09 Thread via GitHub
Weijun-H commented on issue #3821: URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1462871785 This is nice extention. I want to take this ticket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] prashanthbdremio commented on a diff in pull request #34424: GH-15187: [Java] Made `reader` initialization lazy and added new `getTransferPair()` function that takes in a `Field` type

2023-03-09 Thread via GitHub
prashanthbdremio commented on code in PR #34424: URL: https://github.com/apache/arrow/pull/34424#discussion_r1131638059 ## java/vector/src/main/java/org/apache/arrow/vector/BigIntVector.java: ## @@ -71,7 +73,11 @@ public BigIntVector(String name, FieldType fieldType, BufferAllo

[GitHub] [arrow] lwhite1 commented on issue #15187: [Java] Allocate the FieldReader for Arrow Vectors only on demand. Also introduce getTransferPair which takes Field as parameter.

2023-03-09 Thread via GitHub
lwhite1 commented on issue #15187: URL: https://github.com/apache/arrow/issues/15187#issuecomment-1462845250 Other than my comment about trying to lift the code for a thread-safe supplier from Guava, this looks good to me. FWIW, we did something similar with org.apache.arrow.util.Preconditi

[GitHub] [arrow] github-actions[bot] commented on pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

2023-03-09 Thread via GitHub
github-actions[bot] commented on PR #34524: URL: https://github.com/apache/arrow/pull/34524#issuecomment-1462841906 * Closes: #34421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] amoeba opened a new pull request, #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

2023-03-09 Thread via GitHub
amoeba opened a new pull request, #34524: URL: https://github.com/apache/arrow/pull/34524 ### Rationale for this change Existing documentation for this argument was misleading. ### What changes are included in this PR? A change in functionality, matching tests, and update

[GitHub] [arrow] drin commented on issue #34451: [C++][Python] A metadata standard for sorted datasets.

2023-03-09 Thread via GitHub
drin commented on issue #34451: URL: https://github.com/apache/arrow/issues/34451#issuecomment-1462828023 wanted to mention that #32884 likely has some relevance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] amoeba commented on issue #34523: [R] Unable to compile R package with GCS on macOS M1

2023-03-09 Thread via GitHub
amoeba commented on issue #34523: URL: https://github.com/apache/arrow/issues/34523#issuecomment-1462794116 Removing Homebrew abseil then re-building from scratch worked. I was then able to reinstall abseil from Homebrew and build w/o `absl_SOURCE=BUNDLED` which is what I had been doing bef

[GitHub] [arrow-nanoarrow] jorisvandenbossche commented on a diff in pull request #117: feat(python): Python schema, array, and array view skeleton

2023-03-09 Thread via GitHub
jorisvandenbossche commented on code in PR #117: URL: https://github.com/apache/arrow-nanoarrow/pull/117#discussion_r1131521826 ## python/.gitignore: ## @@ -18,7 +18,8 @@ src/nanoarrow/nanoarrow.c src/nanoarrow/nanoarrow.h -src/nanoarrow/*.cpp +src/nanoarrow/nanoarrow_c.pxd

[GitHub] [arrow] lwhite1 commented on a diff in pull request #34424: GH-15187: [Java] Made `reader` initialization lazy and added new `getTransferPair()` function that takes in a `Field` type

2023-03-09 Thread via GitHub
lwhite1 commented on code in PR #34424: URL: https://github.com/apache/arrow/pull/34424#discussion_r1131574389 ## java/vector/src/main/java/org/apache/arrow/vector/BigIntVector.java: ## @@ -71,7 +73,11 @@ public BigIntVector(String name, FieldType fieldType, BufferAllocator all

[GitHub] [arrow-nanoarrow] paleolimbot merged pull request #158: chore: Fix typo in post-release instructions for announce email

2023-03-09 Thread via GitHub
paleolimbot merged PR #158: URL: https://github.com/apache/arrow-nanoarrow/pull/158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow] alippai commented on issue #34510: Reading FixedSizeList from parquet is slower than reading values into more rows

2023-03-09 Thread via GitHub
alippai commented on issue #34510: URL: https://github.com/apache/arrow/issues/34510#issuecomment-1462753928 The same happens with not null values (I'm not sure how to define the not null list correctly, but looks like it doesn't matter): ```python import numpy as np import pyarrow

[GitHub] [arrow-datafusion] metesynnada commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-09 Thread via GitHub
metesynnada commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1131561930 ## datafusion/core/src/execution/context.rs: ## @@ -318,6 +319,19 @@ impl SessionContext { let plan = self.state().create_logical_plan(sql).await?

[GitHub] [arrow] ursabot commented on pull request #34445: GH-34283 [Python] Add types_mapper support to index for to_pandas

2023-03-09 Thread via GitHub
ursabot commented on PR #34445: URL: https://github.com/apache/arrow/pull/34445#issuecomment-1462746897 Benchmark runs are scheduled for baseline = 17f416f80f0bccd58173308d8e0aa326363bd388 and contender = b679a96d426f4df1a2d15d452f312c968cdfc8f6. b679a96d426f4df1a2d15d452f312c968cdfc8f6 is

[GitHub] [arrow-datafusion] metesynnada commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-09 Thread via GitHub
metesynnada commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1131560512 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -203,6 +203,24 @@ impl LogicalPlanBuilder { Self::scan_with_filters(table_name, table_source

[GitHub] [arrow] benibus commented on a diff in pull request #8510: GH-15483: [C++] Add a Fixed Shape Tensor canonical ExtensionType

2023-03-09 Thread via GitHub
benibus commented on code in PR #8510: URL: https://github.com/apache/arrow/pull/8510#discussion_r1131558605 ## cpp/src/arrow/extension/fixed_shape_tensor.cc: ## @@ -0,0 +1,299 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

[GitHub] [arrow-datafusion] metesynnada commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-09 Thread via GitHub
metesynnada commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1131558526 ## datafusion/core/src/datasource/memory.rs: ## @@ -143,22 +147,95 @@ impl TableProvider for MemTable { _filters: &[Expr], _limit: Option

[GitHub] [arrow] lwhite1 commented on issue #34393: [Java] Arrow deserialization performance is so poor in Java

2023-03-09 Thread via GitHub
lwhite1 commented on issue #34393: URL: https://github.com/apache/arrow/issues/34393#issuecomment-1462732476 There's no support for filtering in Java until the substrait interface to Arrow C++ is completed. I suspect that will be in the next release or two. -- This is an automated messag

[GitHub] [arrow] danepitkin commented on pull request #33862: GH-33825: [Python] Expose pyarrow.dataset.get_partition_keys publicly (get key/value from partition expression)

2023-03-09 Thread via GitHub
danepitkin commented on PR #33862: URL: https://github.com/apache/arrow/pull/33862#issuecomment-1462718968 LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5537: Minor: add the concise way for matching numerics

2023-03-09 Thread via GitHub
alamb commented on code in PR #5537: URL: https://github.com/apache/arrow-datafusion/pull/5537#discussion_r1131544013 ## datafusion/expr/src/type_coercion/aggregates.rs: ## @@ -314,77 +314,45 @@ pub fn sum_return_type(arg_type: &DataType) -> Result { /// function return type

[GitHub] [arrow-datafusion] alamb commented on pull request #5511: Simplify simplify test cases, support `^`, `&`, `|`, `<<` and `>>` operators for building exprs

2023-03-09 Thread via GitHub
alamb commented on PR #5511: URL: https://github.com/apache/arrow-datafusion/pull/5511#issuecomment-1462706360 > > If this syntax is not deprecated, it should probably be tested as well? > I agree it should be tested. I will review our existing coverage and see if anything else is ne

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5511: Simplify simplify test cases, support `^`, `&`, `|`, `<<` and `>>` operators for building exprs

2023-03-09 Thread via GitHub
alamb commented on code in PR #5511: URL: https://github.com/apache/arrow-datafusion/pull/5511#discussion_r1131535249 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -2227,87 +2069,86 @@ mod tests { #[test] fn test_simplify_or_and_non_null()

[GitHub] [arrow-datafusion] izveigor opened a new pull request, #5537: Minor: add the concise way for matching numerics

2023-03-09 Thread via GitHub
izveigor opened a new pull request, #5537: URL: https://github.com/apache/arrow-datafusion/pull/5537 # Rationale for this change The array "NUMERICS", that are defined in the file "datafusion/expr/src/type_coercion/aggregates.rs" (see https://github.com/apache/arrow-datafusion/blob/main/

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5511: Simplify simplify test cases, support `^`, `&`, `|`, `<<` and `>>` operators for building exprs

2023-03-09 Thread via GitHub
alamb commented on code in PR #5511: URL: https://github.com/apache/arrow-datafusion/pull/5511#discussion_r1131533045 ## datafusion/expr/src/operator.rs: ## @@ -275,6 +280,60 @@ impl ops::Rem for Expr { } } +/// Support ` & ` fluent style +impl ops::BitAnd for Expr { +

[GitHub] [arrow-nanoarrow] paleolimbot merged pull request #157: chore: Add email template to release instructions

2023-03-09 Thread via GitHub
paleolimbot merged PR #157: URL: https://github.com/apache/arrow-nanoarrow/pull/157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

2023-03-09 Thread via GitHub
alamb commented on code in PR #5509: URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1131527905 ## datafusion/expr/src/expr_rewriter.rs: ## @@ -365,6 +377,23 @@ pub fn normalize_col_with_schemas( }) } +pub fn normalize_col_with_schemas_and_ambiguity

[GitHub] [arrow] lidavidm merged pull request #34522: MINOR: [C++] Use if-constexpr to simplify visit_data_inline.h

2023-03-09 Thread via GitHub
lidavidm merged PR #34522: URL: https://github.com/apache/arrow/pull/34522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

2023-03-09 Thread via GitHub
alamb commented on code in PR #5509: URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1131524917 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -174,7 +174,26 @@ impl LogicalPlan { } } +/// Used for normalizing columns, as the fallbac

[GitHub] [arrow-adbc] lidavidm merged pull request #503: chore(dev/release): verify using main, not tag

2023-03-09 Thread via GitHub
lidavidm merged PR #503: URL: https://github.com/apache/arrow-adbc/pull/503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-adbc] lidavidm merged pull request #502: ci: don't build Conda packages for RCs

2023-03-09 Thread via GitHub
lidavidm merged PR #502: URL: https://github.com/apache/arrow-adbc/pull/502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-adbc] lidavidm merged pull request #500: ci: don't deploy website on RC tags

2023-03-09 Thread via GitHub
lidavidm merged PR #500: URL: https://github.com/apache/arrow-adbc/pull/500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-adbc] lidavidm merged pull request #501: docs(dev/release): add conda-forge to post-release process

2023-03-09 Thread via GitHub
lidavidm merged PR #501: URL: https://github.com/apache/arrow-adbc/pull/501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-09 Thread via GitHub
alamb commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1131521846 ## datafusion/core/src/datasource/datasource.rs: ## @@ -97,6 +97,16 @@ pub trait TableProvider: Sync + Send { fn statistics(&self) -> Option { None

[GitHub] [arrow] maartenbreddels commented on issue #33049: [C++][Python] Large strings cause ArrowInvalid: offset overflow while concatenating arrays

2023-03-09 Thread via GitHub
maartenbreddels commented on issue #33049: URL: https://github.com/apache/arrow/issues/33049#issuecomment-1462681341 I think there are 2 issues being reported here right? 1. The one that matches the original topic: no auto string->large_string casting in combine_chunks. 2. .take on c

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-09 Thread via GitHub
comphead commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1131495741 ## datafusion/core/src/datasource/datasource.rs: ## @@ -97,6 +97,16 @@ pub trait TableProvider: Sync + Send { fn statistics(&self) -> Option { N

[GitHub] [arrow] icexelloss commented on pull request #34311: GH-32884: [C++] Add ordered aggregation

2023-03-09 Thread via GitHub
icexelloss commented on PR #34311: URL: https://github.com/apache/arrow/pull/34311#issuecomment-1462677016 > Linked in both PR and original GH issue to #34475 which has the list. Can you put the follow up GH issue link in the list? -- This is an automated message from the Apache Git

[GitHub] [arrow-adbc] lidavidm commented on pull request #504: docs: update README and add FAQ

2023-03-09 Thread via GitHub
lidavidm commented on PR #504: URL: https://github.com/apache/arrow-adbc/pull/504#issuecomment-1462666025 I'll leave this open for a bit for any other comments and merge over the weekend -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] paleolimbot commented on issue #34319: [R] v11.0.0.2 extremely slow with parquet files written in v9.0.0

2023-03-09 Thread via GitHub
paleolimbot commented on issue #34319: URL: https://github.com/apache/arrow/issues/34319#issuecomment-1462665299 A good workaround is probably just to use the `ParquetFileReader` directly: ``` r library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build

[GitHub] [arrow-adbc] lidavidm merged pull request #505: docs(go): add LICENSE.txt so pkg.go.dev will display docs

2023-03-09 Thread via GitHub
lidavidm merged PR #505: URL: https://github.com/apache/arrow-adbc/pull/505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] rtpsw commented on pull request #34311: GH-32884: [C++] Add ordered aggregation

2023-03-09 Thread via GitHub
rtpsw commented on PR #34311: URL: https://github.com/apache/arrow/pull/34311#issuecomment-1462655518 > Can you gather all the follow up issues and put them as a list in the PR description and the origin GH issue as well? Linked in both PR and original GH issue to #34475 which has the

[GitHub] [arrow-datafusion] mslapek commented on pull request #5521: Add UserDefinedLogicalNodeCore

2023-03-09 Thread via GitHub
mslapek commented on PR #5521: URL: https://github.com/apache/arrow-datafusion/pull/5521#issuecomment-1462653261 @alamb Thanks for the review! 💫 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow-datafusion] ursabot commented on pull request #5521: Add UserDefinedLogicalNodeCore

2023-03-09 Thread via GitHub
ursabot commented on PR #5521: URL: https://github.com/apache/arrow-datafusion/pull/5521#issuecomment-1462653046 Benchmark runs are scheduled for baseline = 7f84503bb93f72c7be29c4d30c7f2c1ce869a1e9 and contender = 8c34ca4fa34787b137b48ce4f6ffd41b64a1a633. 8c34ca4fa34787b137b48ce4f6ffd41b6

[GitHub] [arrow] rtpsw commented on issue #32884: [C++] Add ordered aggregation

2023-03-09 Thread via GitHub
rtpsw commented on issue #32884: URL: https://github.com/apache/arrow/issues/32884#issuecomment-1462652485 Follow-ups listed in #34475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] github-actions[bot] commented on pull request #34311: GH-32884: [C++] Add ordered aggregation

2023-03-09 Thread via GitHub
github-actions[bot] commented on PR #34311: URL: https://github.com/apache/arrow/pull/34311#issuecomment-1462651920 :warning: GitHub issue #32884 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-09 Thread via GitHub
comphead commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1131495741 ## datafusion/core/src/datasource/datasource.rs: ## @@ -97,6 +97,16 @@ pub trait TableProvider: Sync + Send { fn statistics(&self) -> Option { N

[GitHub] [arrow-datafusion] alamb merged pull request #5521: Add UserDefinedLogicalNodeCore

2023-03-09 Thread via GitHub
alamb merged PR #5521: URL: https://github.com/apache/arrow-datafusion/pull/5521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb commented on pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

2023-03-09 Thread via GitHub
alamb commented on PR #5506: URL: https://github.com/apache/arrow-datafusion/pull/5506#issuecomment-1462639391 Here is an alternate approach: https://github.com/apache/arrow-datafusion/pull/5536 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow-datafusion] alamb opened a new pull request, #5536: Avoid circular(ish) dependency parquet-test-utils on datafusion, try 2

2023-03-09 Thread via GitHub
alamb opened a new pull request, #5536: URL: https://github.com/apache/arrow-datafusion/pull/5536 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/5453 (see alternate approach in https://github.com/apache/arrow-datafusion/pull/5506) # Ra

[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #156: [C] Set vector length in array struct

2023-03-09 Thread via GitHub
paleolimbot commented on issue #156: URL: https://github.com/apache/arrow-nanoarrow/issues/156#issuecomment-1462634052 You'll have to set the lengths of child arrays by yourself if you're constructing the arrays "by hand" (i.e., by filling in or pointing to your own buffers). If you do thi

[GitHub] [arrow-datafusion] alamb closed pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

2023-03-09 Thread via GitHub
alamb closed pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion URL: https://github.com/apache/arrow-datafusion/pull/5506 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow-datafusion] alamb commented on pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

2023-03-09 Thread via GitHub
alamb commented on PR #5506: URL: https://github.com/apache/arrow-datafusion/pull/5506#issuecomment-1462628991 > I was referring to this - https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/test_util.rs 🤔 it seems like all that is in `parquet-test-util` is this

[GitHub] [arrow-adbc] lidavidm commented on pull request #505: docs(go): add LICENSE.txt so pkg.go.dev will display docs

2023-03-09 Thread via GitHub
lidavidm commented on PR #505: URL: https://github.com/apache/arrow-adbc/pull/505#issuecomment-1462627130 gah. yes, it has to be go.mod... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow-adbc] zeroshade commented on pull request #505: docs(go): add LICENSE.txt so pkg.go.dev will display docs

2023-03-09 Thread via GitHub
zeroshade commented on PR #505: URL: https://github.com/apache/arrow-adbc/pull/505#issuecomment-1462624194 I don't remember if it needs to go into the `adbc` dir because that's where the go.mod is or if it'll pull the license in from the parent dir or not -- This is an automated messa

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-09 Thread via GitHub
alamb commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1131468017 ## datafusion/core/src/datasource/memory.rs: ## @@ -388,4 +465,115 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn test_insert_into_s

[GitHub] [arrow-datafusion] ozankabak commented on issue #5535: Consolidate 3 Range Analysis / Interval implementations (cost model, pruning predicates, interval analysis)

2023-03-09 Thread via GitHub
ozankabak commented on issue #5535: URL: https://github.com/apache/arrow-datafusion/issues/5535#issuecomment-1462611417 Thank you for opening an issue to track this. We are actively working on extending interval support and will unify the existing code along the way. FYI, our first s

  1   2   3   4   >