[GitHub] [arrow] kou commented on pull request #33817: GH-33816: [CI][Conan] Use TARGET_FILE for portability

2023-01-20 Thread via GitHub
kou commented on PR #33817: URL: https://github.com/apache/arrow/pull/33817#issuecomment-1399202474 @github-actions crossbow submit wheel-windows-*-amd64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] kou commented on pull request #33817: GH-33816: [CI][Conan] Use TARGET_FILE for portability

2023-01-20 Thread via GitHub
kou commented on PR #33817: URL: https://github.com/apache/arrow/pull/33817#issuecomment-1399202261 @github-actions crossbow submit conan-maximum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] kou opened a new pull request, #33817: GH-33816: [CI][Conan] Use TARGET_FILE for portability

2023-01-20 Thread via GitHub
kou opened a new pull request, #33817: URL: https://github.com/apache/arrow/pull/33817 ### What changes are included in this PR? Use `$` instead of manual library path resolution. ### Are these changes tested? Yes. ### Are there any user-facing changes?

[GitHub] [arrow] kou commented on issue #33814: [Python] Can't install on Raspberry Pi (Failed building wheel for pyarrow)

2023-01-20 Thread via GitHub
kou commented on issue #33814: URL: https://github.com/apache/arrow/issues/33814#issuecomment-1399201865 It seems that you didn't define `CMAKE_PREFIX_PATH`: https://arrow.apache.org/docs/dev/developers/python.html#using-system-and-bundled-dependencies ```bash export

[GitHub] [arrow] kou opened a new pull request, #33815: GH-33813: [CI][GLib] Use Ruby 3.2 to update bundled MSYS2

2023-01-20 Thread via GitHub
kou opened a new pull request, #33815: URL: https://github.com/apache/arrow/pull/33815 ### What changes are included in this PR? Use Ruby 3.2 to update bundled MSYS2. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. -- This

[GitHub] [arrow] wgtmac commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
wgtmac commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083257758 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow] kou commented on issue #15139: [C++] arrow.pc is missing dependencies with Windows static builds

2023-01-20 Thread via GitHub
kou commented on issue #15139: URL: https://github.com/apache/arrow/issues/15139#issuecomment-1399200167 Hmm. It seems that it's related to vcpkg. Could you open an issue on vcpkg? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] ursabot commented on pull request #33811: GH-33786: [C++] Ignore old system xsimd

2023-01-20 Thread via GitHub
ursabot commented on PR #33811: URL: https://github.com/apache/arrow/pull/33811#issuecomment-1399199450 Benchmark runs are scheduled for baseline = 0d9d132e9140f26578369b5ef0b44d25c501e45d and contender = 2117d028699edd9f4197650890f3226cdd285c23. 2117d028699edd9f4197650890f3226cdd285c23

[GitHub] [arrow] wgtmac commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
wgtmac commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083256983 ## cpp/src/parquet/file_reader.cc: ## @@ -302,6 +303,17 @@ class SerializedFile : public ParquetFileReader::Contents { std::shared_ptr metadata() const override {

[GitHub] [arrow] Oduig commented on issue #33790: [Python] Support for reading .csv files from a zip archive

2023-01-20 Thread via GitHub
Oduig commented on issue #33790: URL: https://github.com/apache/arrow/issues/33790#issuecomment-1399198244 Thank you for the reply, I see the relevant code is in the cpp section! It already works with gz and bz2, but not with (Windows-esque) .zip files. Is there a reason why it is not

[GitHub] [arrow-adbc] kou commented on a diff in pull request #356: feat(go/adbc/driver/pkg/cmake): cmake build for Go shared library drivers

2023-01-20 Thread via GitHub
kou commented on code in PR #356: URL: https://github.com/apache/arrow-adbc/pull/356#discussion_r1083254936 ## c/driver/flightsql/adbc-driver-flightsql.pc.in: ## @@ -0,0 +1,25 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [arrow-datafusion] xiaoyong-z commented on a diff in pull request #4995: [Feature] support describe file

2023-01-20 Thread via GitHub
xiaoyong-z commented on code in PR #4995: URL: https://github.com/apache/arrow-datafusion/pull/4995#discussion_r1083255049 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1625,6 +1637,15 @@ pub struct Prepare { pub input: Arc, } +/// Describe a file Review Comment:

[GitHub] [arrow-datafusion] xiaoyong-z commented on a diff in pull request #4995: [Feature] support describe file

2023-01-20 Thread via GitHub
xiaoyong-z commented on code in PR #4995: URL: https://github.com/apache/arrow-datafusion/pull/4995#discussion_r1083254991 ## datafusion/core/tests/sqllogictests/test_files/describe.slt: ## @@ -0,0 +1,43 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or

[GitHub] [arrow-datafusion] xiaoyong-z commented on a diff in pull request #4995: [Feature] support describe file

2023-01-20 Thread via GitHub
xiaoyong-z commented on code in PR #4995: URL: https://github.com/apache/arrow-datafusion/pull/4995#discussion_r1083254920 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -67,6 +67,10 @@ pub struct ListingTableConfig { pub file_schema: Option, /// Optional

[GitHub] [arrow] wgtmac commented on a diff in pull request #33694: MINOR: [C++][Parquet] Rephrase decimal annotation

2023-01-20 Thread via GitHub
wgtmac commented on code in PR #33694: URL: https://github.com/apache/arrow/pull/33694#discussion_r1083254666 ## cpp/src/parquet/properties.h: ## @@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties { return this->disable_statistics(path->ToDotString()); }

[GitHub] [arrow] kou merged pull request #33811: GH-33786: [C++] Ignore old system xsimd

2023-01-20 Thread via GitHub
kou merged PR #33811: URL: https://github.com/apache/arrow/pull/33811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] kou opened a new pull request, #33812: GH-33796: [C++] Fix wrong arrow-testing.pc config with system GoogleTest

2023-01-20 Thread via GitHub
kou opened a new pull request, #33812: URL: https://github.com/apache/arrow/pull/33812 ### Rationale for this change Empty `-I` in `Cflags:` generates an invalid build command line. ### What changes are included in this PR? Add `Requires: gtest` if `gtest.pc` exists.

[GitHub] [arrow] wgtmac commented on a diff in pull request #33694: MINOR: [C++][Parquet] Rephrase decimal annotation

2023-01-20 Thread via GitHub
wgtmac commented on code in PR #33694: URL: https://github.com/apache/arrow/pull/33694#discussion_r1083253360 ## cpp/src/parquet/properties.h: ## @@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties { return this->disable_statistics(path->ToDotString()); }

[GitHub] [arrow] wgtmac commented on a diff in pull request #33694: MINOR: [C++][Parquet] Rephrase decimal annotation

2023-01-20 Thread via GitHub
wgtmac commented on code in PR #33694: URL: https://github.com/apache/arrow/pull/33694#discussion_r1083252465 ## cpp/src/parquet/properties.h: ## @@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties { return this->disable_statistics(path->ToDotString()); }

[GitHub] [arrow-julia] quinnj merged pull request #381: Tag new version dev/release/release.sh

2023-01-20 Thread via GitHub
quinnj merged PR #381: URL: https://github.com/apache/arrow-julia/pull/381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] kou commented on pull request #33753: GH-30891: [C++] The C++ API for writing datasets could be improved

2023-01-20 Thread via GitHub
kou commented on PR #33753: URL: https://github.com/apache/arrow/pull/33753#issuecomment-1399190351 GLib/Ruby parts are updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] kou commented on a diff in pull request #33791: GH-33782: [Release] Vote email number of issues is querying JIRA and producing a wrong number

2023-01-20 Thread via GitHub
kou commented on code in PR #33791: URL: https://github.com/apache/arrow/pull/33791#discussion_r1083240407 ## dev/release/02-source-test.rb: ## @@ -93,33 +93,26 @@ def test_python_version end def test_vote -jira_url = "https://issues.apache.org/jira; -

[GitHub] [arrow] kou commented on pull request #33811: GH-33786: [C++] Ignore old system xsimd

2023-01-20 Thread via GitHub
kou commented on PR #33811: URL: https://github.com/apache/arrow/pull/33811#issuecomment-1399162810 @github-actions crossbow submit verify-rc-source-cpp-linux-ubuntu-22.04-amd64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] ursabot commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-20 Thread via GitHub
ursabot commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1399162691 Benchmark runs are scheduled for baseline = a16c54567ada6729311fd26bdbee4b5e61901410 and contender = 0d9d132e9140f26578369b5ef0b44d25c501e45d. 0d9d132e9140f26578369b5ef0b44d25c501e45d

[GitHub] [arrow] kou opened a new pull request, #33811: GH-33786: [C++] Ignore old system xsimd

2023-01-20 Thread via GitHub
kou opened a new pull request, #33811: URL: https://github.com/apache/arrow/pull/33811 ### Rationale for this change If old xsimd is installed, CMake target for bundled xsimd is conflicted. ### What changes are included in this PR? Use `arrow::xsimd` for bundled xsimd's

[GitHub] [arrow-datafusion] ozankabak commented on pull request #4866: Support non-equijoin predicate for EliminateCrossJoin

2023-01-20 Thread via GitHub
ozankabak commented on PR #4866: URL: https://github.com/apache/arrow-datafusion/pull/4866#issuecomment-1399161710 Any thoughts on how to make progress on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] kou commented on issue #33786: [Release][C++] Release verification tasks fail with libxsimd-dev installed on ubuntu 22.04

2023-01-20 Thread via GitHub
kou commented on issue #33786: URL: https://github.com/apache/arrow/issues/33786#issuecomment-1399160270 We can use `... ARROW_CMAKE_OPTIONS="-Dxsimd_SOURCE=BUNDLED" dev/release/verify-release-candidate.sh ...` instead of changing `verify-release-candidate.sh`. -- This is an automated

[GitHub] [arrow-julia] kou commented on pull request #381: Tag new version dev/release/release.sh

2023-01-20 Thread via GitHub
kou commented on PR #381: URL: https://github.com/apache/arrow-julia/pull/381#issuecomment-1399158739 OK. We can try TagBot in the next release again. If TagBot doesn't work, we can use this manual tagging approach. -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow] mapleFU commented on issue #15173: [Parquet][C++] ByteStreamSplitDecoder broken in presence of nulls

2023-01-20 Thread via GitHub
mapleFU commented on issue #15173: URL: https://github.com/apache/arrow/issues/15173#issuecomment-1399158204 @wjones127 @emkornfield Hi, what do you think of this problem? Should we assure there are no padding? Or just use method like

[GitHub] [arrow] kou commented on a diff in pull request #33806: GH-33723: [C++] re2::RE2::RE2() result must be checked

2023-01-20 Thread via GitHub
kou commented on code in PR #33806: URL: https://github.com/apache/arrow/pull/33806#discussion_r1083228638 ## cpp/src/arrow/compute/kernels/scalar_string_ascii.cc: ## @@ -1505,6 +1508,13 @@ struct MatchLike { static const RE2

[GitHub] [arrow] cyb70289 commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-20 Thread via GitHub
cyb70289 commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1399156005 Thanks @wgtmac for fixing the issue quickly ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] cyb70289 merged pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-20 Thread via GitHub
cyb70289 merged PR #33739: URL: https://github.com/apache/arrow/pull/33739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] kou commented on pull request #33781: GH-33723: [C++] re2::RE2::RE2() result must be checked

2023-01-20 Thread via GitHub
kou commented on PR #33781: URL: https://github.com/apache/arrow/pull/33781#issuecomment-1399154574 Thanks! But #33806 is better approach. I close this in favor of #33806. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] kou closed pull request #33781: GH-33723: [C++] re2::RE2::RE2() result must be checked

2023-01-20 Thread via GitHub
kou closed pull request #33781: GH-33723: [C++] re2::RE2::RE2() result must be checked URL: https://github.com/apache/arrow/pull/33781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] paleolimbot commented on issue #33702: [R] Package Arrow 11.0.0 for R/CRAN

2023-01-20 Thread via GitHub
paleolimbot commented on issue #33702: URL: https://github.com/apache/arrow/issues/33702#issuecomment-1399154374 Is there an Arrow issue for clang16 yet? I didn't get to it today but I'm sure I can get a docker image together on Monday with clang16 and R. -- This is an automated message

[GitHub] [arrow] zzzzwj commented on a diff in pull request #33703: GH-14748:[C++] Modify some comments about shared_ptr's ownership in parquet-cpp.

2023-01-20 Thread via GitHub
wj commented on code in PR #33703: URL: https://github.com/apache/arrow/pull/33703#discussion_r1083228089 ## cpp/src/parquet/arrow/test_util.h: ## @@ -463,12 +463,16 @@ Status MakeEmptyListsArray(int64_t size, std::shared_ptr* out_array) { return Status::OK(); } +//

[GitHub] [arrow] paleolimbot commented on issue #29428: [R] accept expression lists in Scanner$create() with arrow_dplyr_querys

2023-01-20 Thread via GitHub
paleolimbot commented on issue #29428: URL: https://github.com/apache/arrow/issues/29428#issuecomment-1399153279 Typically our `as_record_batch_reader()` methods just have a `schema` argument (as in, output schema...cast if you can, error if you can't). That's not *quite* what duckdb needs

[GitHub] [arrow] kou commented on pull request #33803: GH-33787: [C++] arrow/cpp/src/arrow/util/cpu_info.cc: the -Werror triggers an error: statement has no effect [-Werror=unused-value]

2023-01-20 Thread via GitHub
kou commented on PR #33803: URL: https://github.com/apache/arrow/pull/33803#issuecomment-1399152264 How about this? ```diff diff --git a/cpp/src/arrow/util/cpu_info.cc b/cpp/src/arrow/util/cpu_info.cc index 3ba8db216..08b7b8b21 100644 --- a/cpp/src/arrow/util/cpu_info.cc

[GitHub] [arrow-adbc] kou commented on issue #366: [Discuss] Is the conventional commit format working?

2023-01-20 Thread via GitHub
kou commented on issue #366: URL: https://github.com/apache/arrow-adbc/issues/366#issuecomment-1399151048 > We can ask INFRA to change the setting to merge using the PR title/description, if people agree. I like the setting! -- This is an automated message from the Apache Git

[GitHub] [arrow-datafusion] ursabot commented on pull request #4834: (#4462) Postgres compatibility tests using sqllogictest

2023-01-20 Thread via GitHub
ursabot commented on PR #4834: URL: https://github.com/apache/arrow-datafusion/pull/4834#issuecomment-1399149236 Benchmark runs are scheduled for baseline = b71cae8aa556369bc5ee72b063ed1fc5a81192f1 and contender = 1d69f28f14acf178377ecf55a343b6e71b4dd856.

[GitHub] [arrow-datafusion] xudong963 merged pull request #4834: (#4462) Postgres compatibility tests using sqllogictest

2023-01-20 Thread via GitHub
xudong963 merged PR #4834: URL: https://github.com/apache/arrow-datafusion/pull/4834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-datafusion] xudong963 closed issue #4462: Replace python based integration test with sqllogictest

2023-01-20 Thread via GitHub
xudong963 closed issue #4462: Replace python based integration test with sqllogictest URL: https://github.com/apache/arrow-datafusion/issues/4462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] xudong963 commented on pull request #4834: (#4462) Postgres compatibility tests using sqllogictest

2023-01-20 Thread via GitHub
xudong963 commented on PR #4834: URL: https://github.com/apache/arrow-datafusion/pull/4834#issuecomment-1399148189 merged, let's iterate! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] vibhatha commented on pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread via GitHub
vibhatha commented on PR #14596: URL: https://github.com/apache/arrow/pull/14596#issuecomment-1399110128 @github-actions crossbow submit test-conda-python-3.9-substrait -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] westonpace commented on pull request #13487: ARROW-8991: [C++] Add hash_64 scalar compute function

2023-01-20 Thread via GitHub
westonpace commented on PR #13487: URL: https://github.com/apache/arrow/pull/13487#issuecomment-1399105512 @drin is this something you are still working on? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] vibhatha commented on pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread via GitHub
vibhatha commented on PR #14596: URL: https://github.com/apache/arrow/pull/14596#issuecomment-1399104798 @github-actions crossbow submit test-conda-python-3.9-substrait -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] westonpace commented on issue #15098: FieldRef with clang-cl from VS 17.4.3 needs operator== with /std:c++20

2023-01-20 Thread via GitHub
westonpace commented on issue #15098: URL: https://github.com/apache/arrow/issues/15098#issuecomment-1399103925 It would seem we need a new CI environment for visual studio & C++20. It should be a fairly straightforward addition to

[GitHub] [arrow] rok commented on pull request #33810: GH-33377: [Python] Table.drop should support passing a single column

2023-01-20 Thread via GitHub
rok commented on PR #33810: URL: https://github.com/apache/arrow/pull/33810#issuecomment-1399101391 > This is my first Arrow PR. I am very open to any feedback or suggestions you might have! Welcome @danepitkin! Thanks for opening the PR! > I followed the instructions as

[GitHub] [arrow] westonpace commented on issue #15281: basic_string_view will be invalid in future libc++

2023-01-20 Thread via GitHub
westonpace commented on issue #15281: URL: https://github.com/apache/arrow/issues/15281#issuecomment-1399101379 Normally I think we'd want to setup a CI to regress this sort of thing. However, clang-18 is not yet released. Do you have any suggestion on how this could be regressed? --

[GitHub] [arrow] vibhatha commented on a diff in pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread via GitHub
vibhatha commented on code in PR #14596: URL: https://github.com/apache/arrow/pull/14596#discussion_r1083164628 ## ci/docker/conda-python-substrait.dockerfile: ## @@ -0,0 +1,45 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [arrow] westonpace commented on issue #15103: Weighted stat aggregations in arrow-compute

2023-01-20 Thread via GitHub
westonpace commented on issue #15103: URL: https://github.com/apache/arrow/issues/15103#issuecomment-1399100195 There is a proposal to support aggregate UDFs but I don't know that it is high priority for anyone. I agree this sounds like a nice feature. Would you be interested in creating

[GitHub] [arrow] vibhatha commented on a diff in pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread via GitHub
vibhatha commented on code in PR #14596: URL: https://github.com/apache/arrow/pull/14596#discussion_r1083164377 ## ci/scripts/integration_substrait.sh: ## @@ -0,0 +1,30 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more

[GitHub] [arrow] westonpace commented on issue #33627: [C++][HDFS] Can't get performance improve when increase the thread number of IO thread pool

2023-01-20 Thread via GitHub
westonpace commented on issue #33627: URL: https://github.com/apache/arrow/issues/33627#issuecomment-1399093386 No, I don't think you are wrong. You should be seeing more parallelism. I am not sure when I will get a chance to fully investigate this however. I don't see anything in here

[GitHub] [arrow] kou commented on a diff in pull request #33805: GH-33804: [Python] Add support for manylinux_2_28 wheel

2023-01-20 Thread via GitHub
kou commented on code in PR #33805: URL: https://github.com/apache/arrow/pull/33805#discussion_r1083143197 ## docker-compose.yml: ## @@ -967,6 +970,31 @@ services: - ${DOCKER_VOLUME_PREFIX}python-wheel-manylinux2014-ccache:/ccache:delegated command:

[GitHub] [arrow] westonpace commented on issue #33668: Reading flat dataset with `partitioning="hive"` results in partition schema equal to dataset schema

2023-01-20 Thread via GitHub
westonpace commented on issue #33668: URL: https://github.com/apache/arrow/issues/33668#issuecomment-1399087997 I can confirm. I reproduced this with the latest and agree it is a bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] nealrichardson commented on pull request #33770: GH-33760: [R][C++] Handle nested field refs in scanner

2023-01-20 Thread via GitHub
nealrichardson commented on PR #33770: URL: https://github.com/apache/arrow/pull/33770#issuecomment-1399086990 > > I thought this was just going to be deleting code from the R package: instead of finding the top-level field names in the projection and sending them in the ScanNode, I'd send

[GitHub] [arrow-rs] albel727 commented on issue #3579: `nullif` incorrectly calculates `null_count`, sometimes panics with substraction overflow error

2023-01-20 Thread via GitHub
albel727 commented on issue #3579: URL: https://github.com/apache/arrow-rs/issues/3579#issuecomment-1399086004 Just in case, here's the code that I used to validate that the quick fix works: ```rust std::panic::set_hook(Box::new(|_info| { /* silence panics */ }));

[GitHub] [arrow] westonpace commented on pull request #14799: ARROW-18417: [C++] Support emit info in Substrait extension-multi and AsOfJoin

2023-01-20 Thread via GitHub
westonpace commented on PR #14799: URL: https://github.com/apache/arrow/pull/14799#issuecomment-1399085737 @rtpsw this will need a rebase. Some of the changes here were brought in with the backpressure change. Would you like me to do this or do you want to? -- This is an automated

[GitHub] [arrow-rs] albel727 opened a new issue, #3579: `nullif` incorrectly calculates `null_count`, sometimes panics with substraction overflow error

2023-01-20 Thread via GitHub
albel727 opened a new issue, #3579: URL: https://github.com/apache/arrow-rs/issues/3579 **Describe the bug** `nullif(left, right)` incorrectly calculates `null_count` for the result array, whenever `left` doesn't have a null_buffer and has `len % 64 == 0`. It can even panic, if there

[GitHub] [arrow] vibhatha commented on a diff in pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread via GitHub
vibhatha commented on code in PR #14596: URL: https://github.com/apache/arrow/pull/14596#discussion_r1083145165 ## ci/scripts/integration_substrait.sh: ## @@ -0,0 +1,30 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more

[GitHub] [arrow] westonpace commented on issue #29428: [R] accept expression lists in Scanner$create() with arrow_dplyr_querys

2023-01-20 Thread via GitHub
westonpace commented on issue #29428: URL: https://github.com/apache/arrow/issues/29428#issuecomment-1399080062 +1 for some kind of `as_record_batch_reader` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] westonpace commented on pull request #33770: GH-33760: [R][C++] Handle nested field refs in scanner

2023-01-20 Thread via GitHub
westonpace commented on PR #33770: URL: https://github.com/apache/arrow/pull/33770#issuecomment-1399077722 > I thought this was just going to be deleting code from the R package: instead of finding the top-level field names in the projection and sending them in the ScanNode, I'd send the

[GitHub] [arrow] westonpace commented on issue #15138: [C++] Clustered By -- how?

2023-01-20 Thread via GitHub
westonpace commented on issue #15138: URL: https://github.com/apache/arrow/issues/15138#issuecomment-1399076522 Internally we do this by hashing the column. There is a PR under way (though it could use some review) to add a new hash compute function

[GitHub] [arrow] ursabot commented on pull request #33725: GH-33724: [Doc] Update the substrait conformance doc with the latest support

2023-01-20 Thread via GitHub
ursabot commented on PR #33725: URL: https://github.com/apache/arrow/pull/33725#issuecomment-1399074351 Benchmark runs are scheduled for baseline = f7aa50dbeccdcc800a0ffc695b107c1cdc688156 and contender = a16c54567ada6729311fd26bdbee4b5e61901410. a16c54567ada6729311fd26bdbee4b5e61901410

[GitHub] [arrow] westonpace commented on a diff in pull request #33775: ARROW-18425: [Substrait] Add Substrait→Acero mapping for round operationMajor:

2023-01-20 Thread via GitHub
westonpace commented on code in PR #33775: URL: https://github.com/apache/arrow/pull/33775#discussion_r1083118332 ## cpp/src/arrow/compute/api_scalar.h: ## @@ -882,6 +891,20 @@ ARROW_EXPORT Result Round(const Datum& arg, RoundOptions options = RoundOptions::Defaults(),

[GitHub] [arrow] zeroshade commented on pull request #14989: ARROW-18438: [Go][Parquet] Panic in bitmap writer

2023-01-20 Thread via GitHub
zeroshade commented on PR #14989: URL: https://github.com/apache/arrow/pull/14989#issuecomment-1399068130 @minyoung I found the issue and the solution that's not a hack: in parquet/file/column_writer_types.gen.go.tmpl lines 143 - 147 change this: ```go if

[GitHub] [arrow] danepitkin commented on issue #33377: [Python] Table.drop should support passing a single column

2023-01-20 Thread via GitHub
danepitkin commented on issue #33377: URL: https://github.com/apache/arrow/issues/33377#issuecomment-1399062211 Whoops, looks like I did not correctly link my PR to the original issue. My PR is here https://github.com/apache/arrow/pull/33810. -- This is an automated message from the

[GitHub] [arrow] danepitkin commented on pull request #33810: GH-33377: [Python] Table.drop should support passing a single column

2023-01-20 Thread via GitHub
danepitkin commented on PR #33810: URL: https://github.com/apache/arrow/pull/33810#issuecomment-1399047969 This is my first Arrow PR. I am very open to any feedback or suggestions you might have! -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] westonpace merged pull request #33725: GH-33724: [Doc] Update the substrait conformance doc with the latest support

2023-01-20 Thread via GitHub
westonpace merged PR #33725: URL: https://github.com/apache/arrow/pull/33725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] westonpace commented on issue #33759: [Python][C++] How to limit the memory consumption of to_batches()

2023-01-20 Thread via GitHub
westonpace commented on issue #33759: URL: https://github.com/apache/arrow/issues/33759#issuecomment-1399046967 Hmm, backpressure should be applied then. Once you call `to_batches` it should start to read in the background. Eventually, at a certain point, it should stop reading because

[GitHub] [arrow] danepitkin opened a new pull request, #33810: GH-33377: [Python] Table.drop should support passing a single column

2023-01-20 Thread via GitHub
danepitkin opened a new pull request, #33810: URL: https://github.com/apache/arrow/pull/33810 ### Rationale for this change Provide a better user experience in pyarrow when working with `Table`. ### What changes are included in this PR? Allow

[GitHub] [arrow] westonpace commented on issue #33790: [Python] Support for reading .csv files from a zip archive

2023-01-20 Thread via GitHub
westonpace commented on issue #33790: URL: https://github.com/apache/arrow/issues/33790#issuecomment-1399044202 Outside of datasets this is normally achieved by opening a compressed input stream and using the CSV stream reader. If the path ends in `.gz` or `.bz2` I think we also guess

[GitHub] [arrow] westonpace commented on issue #33797: [C++] Add decimal version of Round benchmarks

2023-01-20 Thread via GitHub
westonpace commented on issue #33797: URL: https://github.com/apache/arrow/issues/33797#issuecomment-1399039367 @aayushpandey014 I have assigned this to you. In the future you can always comment `take` and our bots will assign an issue to you. -- This is an automated message from the

[GitHub] [arrow] wjones127 commented on a diff in pull request #14353: ARROW-17735: [C++][Parquet] Optimize parquet reading for String/Binary type

2023-01-20 Thread via GitHub
wjones127 commented on code in PR #14353: URL: https://github.com/apache/arrow/pull/14353#discussion_r1083109332 ## cpp/src/parquet/encoding.h: ## @@ -317,6 +317,13 @@ class TypedDecoder : virtual public Decoder { int64_t valid_bits_offset,

[GitHub] [arrow] ursabot commented on pull request #33809: MINOR: [R] Update BugReports field in DESCRIPTION

2023-01-20 Thread via GitHub
ursabot commented on PR #33809: URL: https://github.com/apache/arrow/pull/33809#issuecomment-1399037705 Benchmark runs are scheduled for baseline = f9ce32ebab5071b8fc48a135a730c22313aaf9b3 and contender = f7aa50dbeccdcc800a0ffc695b107c1cdc688156. f7aa50dbeccdcc800a0ffc695b107c1cdc688156

[GitHub] [arrow-datafusion-python] martin-g commented on pull request #129: test: Expand tests for built-in functions

2023-01-20 Thread via GitHub
martin-g commented on PR #129: URL: https://github.com/apache/arrow-datafusion-python/pull/129#issuecomment-1399036701 I always prefer using `git rebase` for Pull Request branches. This way the commit history is cleaner. Rebasing also makes it easier to use `Squash and merge` Github UI

[GitHub] [arrow] richtia commented on a diff in pull request #14596: ARROW-18258: [Docker] Substrait Integration Testing

2023-01-20 Thread via GitHub
richtia commented on code in PR #14596: URL: https://github.com/apache/arrow/pull/14596#discussion_r1083107221 ## ci/docker/conda-python-substrait.dockerfile: ## @@ -0,0 +1,45 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [arrow-datafusion] alamb commented on pull request #4834: (#4462) Postgres compatibility tests using sqllogictest

2023-01-20 Thread via GitHub
alamb commented on PR #4834: URL: https://github.com/apache/arrow-datafusion/pull/4834#issuecomment-1399016790 Assuming this PR passes CI checks I plan to merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] alamb commented on pull request #4834: (#4462) Postgres compatibility tests using sqllogictest

2023-01-20 Thread via GitHub
alamb commented on PR #4834: URL: https://github.com/apache/arrow-datafusion/pull/4834#issuecomment-1399015229 > For now, I merge master into this branch, so that it is mergeable. Thanks @melgenek . I filed issues for the follow on tasks: - [ ]

[GitHub] [arrow-datafusion] alamb opened a new issue, #5011: [sqllogictest] Remove `integration-tests` directory

2023-01-20 Thread via GitHub
alamb opened a new issue, #5011: URL: https://github.com/apache/arrow-datafusion/issues/5011 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** https://github.com/apache/arrow-datafusion/pull/4834 adds a way to run sqllogictests

[GitHub] [arrow-datafusion] alamb opened a new issue, #5010: [sqllogictest] Consolidate normalization code for the postgres and non-postgres paths

2023-01-20 Thread via GitHub
alamb opened a new issue, #5010: URL: https://github.com/apache/arrow-datafusion/issues/5010 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** https://github.com/apache/arrow-datafusion/pull/4834 adds a way to run sqllogictests

[GitHub] [arrow] ursabot commented on pull request #33795: GH-33794: [Go] Add SetRecordReader to PreparedStatement

2023-01-20 Thread via GitHub
ursabot commented on PR #33795: URL: https://github.com/apache/arrow/pull/33795#issuecomment-1399012764 Benchmark runs are scheduled for baseline = 0e4a2e19e36d70a3072ce5275129d15fdb187c64 and contender = f9ce32ebab5071b8fc48a135a730c22313aaf9b3. f9ce32ebab5071b8fc48a135a730c22313aaf9b3

[GitHub] [arrow-datafusion] alamb opened a new issue, #5009: [sqllogictest] Don't orchestrate the postgres containers with rust / docker

2023-01-20 Thread via GitHub
alamb opened a new issue, #5009: URL: https://github.com/apache/arrow-datafusion/issues/5009 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** https://github.com/apache/arrow-datafusion/pull/4834 adds a way to run sqllogictests

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083090721 ## cpp/src/parquet/page_index.h: ## @@ -126,4 +132,94 @@ class PARQUET_EXPORT OffsetIndex { virtual const std::vector& page_locations() const = 0; }; +///

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083090112 ## cpp/src/parquet/page_index.h: ## @@ -126,4 +132,94 @@ class PARQUET_EXPORT OffsetIndex { virtual const std::vector& page_locations() const = 0; }; +///

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083089551 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083089071 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083088957 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow-adbc] lidavidm merged pull request #369: fix(go/adbc/driver/flightsql): heap-allocate Go handles

2023-01-20 Thread via GitHub
lidavidm merged PR #369: URL: https://github.com/apache/arrow-adbc/pull/369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083088746 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083088486 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083088136 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083087767 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow-datafusion] ursabot commented on pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread via GitHub
ursabot commented on PR #4989: URL: https://github.com/apache/arrow-datafusion/pull/4989#issuecomment-1399003734 Benchmark runs are scheduled for baseline = 92d0a054c23e5fba91718db32ccd933ce86dd2b6 and contender = b71cae8aa556369bc5ee72b063ed1fc5a81192f1.

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083085814 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,219 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083084578 ## cpp/src/parquet/file_reader.cc: ## @@ -302,6 +303,17 @@ class SerializedFile : public ParquetFileReader::Contents { std::shared_ptr metadata() const

[GitHub] [arrow] emkornfield commented on a diff in pull request #14964: GH-33596: [C++][Parquet] Parquet page index read support

2023-01-20 Thread via GitHub
emkornfield commented on code in PR #14964: URL: https://github.com/apache/arrow/pull/14964#discussion_r1083083858 ## cpp/src/parquet/page_index.cc: ## @@ -184,8 +185,241 @@ class OffsetIndexImpl : public OffsetIndex { std::vector page_locations_; }; +class

[GitHub] [arrow] thisisnic merged pull request #33809: MINOR: [R] Update BugReports field in DESCRIPTION

2023-01-20 Thread via GitHub
thisisnic merged PR #33809: URL: https://github.com/apache/arrow/pull/33809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-datafusion] alamb closed issue #4979: Add support for linear range search

2023-01-20 Thread via GitHub
alamb closed issue #4979: Add support for linear range search URL: https://github.com/apache/arrow-datafusion/issues/4979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow-datafusion] alamb merged pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread via GitHub
alamb merged PR #4989: URL: https://github.com/apache/arrow-datafusion/pull/4989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

  1   2   3   4   5   >