[GitHub] [arrow-rs] tustvold merged pull request #4085: Store StructArray entries in MapArray

2023-04-13 Thread via GitHub
tustvold merged PR #4085: URL: https://github.com/apache/arrow-rs/pull/4085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] jorisvandenbossche commented on pull request #35123: GH-35122: [C++] Add support for scalars in run_end_encode and run_end_decode

2023-04-13 Thread via GitHub
jorisvandenbossche commented on PR #35123: URL: https://github.com/apache/arrow/pull/35123#issuecomment-1507985296 Checking some other vector functions passing them a scalar, there are that raise a NotImplementedError (eg sort_indices, rank) or that return an array (eg replace_with_mask, cu

[GitHub] [arrow] ursabot commented on pull request #35070: GH-35069: [Archery][Release] Remove retrieving ARROW issue from migration comment on Archery release

2023-04-13 Thread via GitHub
ursabot commented on PR #35070: URL: https://github.com/apache/arrow/pull/35070#issuecomment-1507922000 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f405fa51a8904d89a0b77e2f62878e5a...8d5493a61a074fdfa52a0ae7e88913fa/)

[GitHub] [arrow] ursabot commented on pull request #35070: GH-35069: [Archery][Release] Remove retrieving ARROW issue from migration comment on Archery release

2023-04-13 Thread via GitHub
ursabot commented on PR #35070: URL: https://github.com/apache/arrow/pull/35070#issuecomment-1507921446 Benchmark runs are scheduled for baseline = 5e8db3156c733a31e196683011db113e76ce6a32 and contender = 11ecfd91e3c051761f34b37f1ec88335100bcd57. 11ecfd91e3c051761f34b37f1ec88335100bcd57 is

[GitHub] [arrow-datafusion] jychen7 commented on issue #5969: Clickbench q32 not working in 32GB RAM

2023-04-13 Thread via GitHub
jychen7 commented on issue #5969: URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1507915671 > reg to flamegraph the datafusion::physical_plan::aggregates::row_hash::slice_and_maybe_filter is expensive. it is expensive because excessive vector allocations

[GitHub] [arrow-datafusion] comphead commented on issue #5969: Clickbench q32 not working in 32GB RAM

2023-04-13 Thread via GitHub
comphead commented on issue #5969: URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1507912263 reg to flamegraph the `datafusion::physical_plan::aggregates::row_hash::slice_and_maybe_filter` is expensive. its expensive because excessive vector allocations. I will

[GitHub] [arrow] comicfans commented on issue #35105: [R] R can't parse parquet written by pyarrow with BYTE_STREAM_SPLIT column encoding

2023-04-13 Thread via GitHub
comicfans commented on issue #35105: URL: https://github.com/apache/arrow/issues/35105#issuecomment-1507910900 seems like this is a common problem, pyarrow also give same error for the generated file. I've attached the good input file for testing [sample.zip](https://github.com/apache/

[GitHub] [arrow-datafusion] WenyXu opened a new pull request, #6004: chore: make JsonOpener and CsvOpener public

2023-04-13 Thread via GitHub
WenyXu opened a new pull request, #6004: URL: https://github.com/apache/arrow-datafusion/pull/6004 # Which issue does this PR close? # Rationale for this change make `JsonOpener` and `CsvOpener` public # What changes are included in this PR?

[GitHub] [arrow-datafusion] jychen7 commented on issue #5969: Clickbench q32 not working in 32GB RAM

2023-04-13 Thread via GitHub
jychen7 commented on issue #5969: URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1507903769 when running with `RUST_LOG=debug datafusion-cli`, I find out it is slow during `do_sort`, but not sure which part is slow: `insert_batch` or final `sort`. So I add

[GitHub] [arrow-datafusion] mingmwang opened a new pull request, #6003: Row accumulator support update Scalar values

2023-04-13 Thread via GitHub
mingmwang opened a new pull request, #6003: URL: https://github.com/apache/arrow-datafusion/pull/6003 # Which issue does this PR close? Closes #6002 . # Rationale for this change mprove the Aggregator performance when group by high cardinality columns.

[GitHub] [arrow-datafusion] mingmwang opened a new issue, #6002: Row accumulator support update Scalar values

2023-04-13 Thread via GitHub
mingmwang opened a new issue, #6002: URL: https://github.com/apache/arrow-datafusion/issues/6002 ### Is your feature request related to a problem or challenge? Improve the Aggregator performance when group by high cardinality columns. ### Describe the solution you'd like

[GitHub] [arrow] wgtmac commented on a diff in pull request #35098: GH-35097: [C++] ArrayData support for child_data slice.

2023-04-13 Thread via GitHub
wgtmac commented on code in PR #35098: URL: https://github.com/apache/arrow/pull/35098#discussion_r1166248633 ## cpp/src/arrow/array/data.cc: ## @@ -144,6 +144,8 @@ std::shared_ptr ArrayData::Slice(int64_t off, int64_t len) const { } else { copy->null_count = null_count

[GitHub] [arrow-datafusion] jackwener closed issue #5762: Remove `optimize_children` and replace with `map_children`

2023-04-13 Thread via GitHub
jackwener closed issue #5762: Remove `optimize_children` and replace with `map_children` URL: https://github.com/apache/arrow-datafusion/issues/5762 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow-datafusion] jackwener merged pull request #5984: Remove optimize_children and replace with map_children

2023-04-13 Thread via GitHub
jackwener merged PR #5984: URL: https://github.com/apache/arrow-datafusion/pull/5984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] yukkit opened a new issue, #6001: Incorrect column pruning in sql with window operations

2023-04-13 Thread via GitHub
yukkit opened a new issue, #6001: URL: https://github.com/apache/arrow-datafusion/issues/6001 ### Describe the bug As the title ### To Reproduce ```sql ❯ explain select sum(case when latitude < 50.0 then latitude else 0 end) over (partition by name) from readings;

[GitHub] [arrow] h-vetinari commented on issue #34805: [CI][Python] Cython test is failing in conda packaging builds

2023-04-13 Thread via GitHub
h-vetinari commented on issue #34805: URL: https://github.com/apache/arrow/issues/34805#issuecomment-1507874311 Yes, I added a compiler to the test requirements, so `gcc` will be found on linux -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow-datafusion] jychen7 commented on issue #3516: Support Top-K query optimization for `ORDER BY [ASC|DESC] LIMIT n`

2023-04-13 Thread via GitHub
jychen7 commented on issue #3516: URL: https://github.com/apache/arrow-datafusion/issues/3516#issuecomment-1507872093 > There are a couple of followups possible (will create some tickets for them and close this one): > Use limit in SortPreserveMergeExec is there an issue created f

[GitHub] [arrow-datafusion] jychen7 opened a new issue, #6000: Push down limit to SortPreservingMergeExec and SortPreservingMergeStream

2023-04-13 Thread via GitHub
jychen7 opened a new issue, #6000: URL: https://github.com/apache/arrow-datafusion/issues/6000 ### Is your feature request related to a problem or challenge? This is separated from https://github.com/apache/arrow-datafusion/issues/3516#issuecomment-1254006432. On a high level,

[GitHub] [arrow] github-actions[bot] commented on pull request #35114: GH-35124: [C++] Avoid unnecessary copy when outputting join result

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #35114: URL: https://github.com/apache/arrow/pull/35114#issuecomment-1507831954 :warning: GitHub issue #35124 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] github-actions[bot] commented on pull request #35114: GH-35124: [C++] Avoid unnecessary copy when outputting join result

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #35114: URL: https://github.com/apache/arrow/pull/35114#issuecomment-1507831913 * Closes: #35124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] kou commented on pull request #34069: GH-34068: [C++][Gandiva] Add extra functions

2023-04-13 Thread via GitHub
kou commented on PR #34069: URL: https://github.com/apache/arrow/pull/34069#issuecomment-1507829711 OK. I close this for now. We can restart this with multiple smaller PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [arrow] kou closed pull request #34069: GH-34068: [C++][Gandiva] Add extra functions

2023-04-13 Thread via GitHub
kou closed pull request #34069: GH-34068: [C++][Gandiva] Add extra functions URL: https://github.com/apache/arrow/pull/34069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow] ursabot commented on pull request #35047: GH-34653: [CI][C++] Fix for arrow-dataset-file-json-test segfault on alpine-linux-cpp

2023-04-13 Thread via GitHub
ursabot commented on PR #35047: URL: https://github.com/apache/arrow/pull/35047#issuecomment-1507826891 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/533dd7d23af149b096b13b771d1cfa7e...f405fa51a8904d89a0b77e2f62878e5a/)

[GitHub] [arrow] ursabot commented on pull request #35047: GH-34653: [CI][C++] Fix for arrow-dataset-file-json-test segfault on alpine-linux-cpp

2023-04-13 Thread via GitHub
ursabot commented on PR #35047: URL: https://github.com/apache/arrow/pull/35047#issuecomment-1507826309 Benchmark runs are scheduled for baseline = 61203456ed33268df0c8c164348a203c7c1be8ca and contender = 5e8db3156c733a31e196683011db113e76ce6a32. 5e8db3156c733a31e196683011db113e76ce6a32 is

[GitHub] [arrow-datafusion] jychen7 commented on issue #3747: DataFusionError(Internal("The size of the sorted batch is larger than the size of the input batch: 2120 > 2312"))

2023-04-13 Thread via GitHub
jychen7 commented on issue #3747: URL: https://github.com/apache/arrow-datafusion/issues/3747#issuecomment-1507817063 > add an issue in arrow-rs to tackle this there. is there an issue in arrow-rs to track? I didn't find it, so create https://github.com/apache/arrow-rs/issues/4087. F

[GitHub] [arrow-rs] jychen7 opened a new issue, #4087: lexsort_to_indices may output larger size than input

2023-04-13 Thread via GitHub
jychen7 opened a new issue, #4087: URL: https://github.com/apache/arrow-rs/issues/4087 **Describe the bug** Not sure if it is arrow-rs bug. This issue is created to track We find a problem in https://github.com/apache/arrow-datafusion/issues/3747#issuecomment-1271514648 **To Rep

[GitHub] [arrow] github-actions[bot] commented on pull request #34818: GH-33804: [Python] Add support for manylinux_2_28 wheel

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #34818: URL: https://github.com/apache/arrow/pull/34818#issuecomment-1507804859 Revision: 22bb08f68880baf1683b9f5d1d7bdf98c221084b Submitted crossbow builds: [ursacomputing/crossbow @ actions-dc1ed1ccb8](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #34818: GH-33804: [Python] Add support for manylinux_2_28 wheel

2023-04-13 Thread via GitHub
kou commented on PR #34818: URL: https://github.com/apache/arrow/pull/34818#issuecomment-1507803480 @github-actions crossbow submit java-jars -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] kou commented on pull request #35092: GH-35086: [Java][CI] Upgrade CycloneDX Maven plugin version

2023-04-13 Thread via GitHub
kou commented on PR #35092: URL: https://github.com/apache/arrow/pull/35092#issuecomment-1507803114 @dongjoon-hyun Thanks for your help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] alamb commented on pull request #35108: GH-35107: [FlightSQL]: Use `uint8` to refer to 8 bit unsigned integers rather than `uint1`

2023-04-13 Thread via GitHub
alamb commented on PR #35108: URL: https://github.com/apache/arrow/pull/35108#issuecomment-1507738393 I'll plan to leave this open for a few more days in case anyone else would like time to comment -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] alamb commented on pull request #35108: GH-35107: [FlightSQL]: Use `uint8` to refer to 8 bit unsigned integers rather than `uint1`

2023-04-13 Thread via GitHub
alamb commented on PR #35108: URL: https://github.com/apache/arrow/pull/35108#issuecomment-1507738016 > +1 > I think that we don't need a vote for this change because this change is just a notation change. I agree -- thank you -- I don't view this as a change of the spec, rathe

[GitHub] [arrow] tswast commented on pull request #34270: GH-34210: [C++] Make casting timestamp and duration zero-copy when TimeUnit matches

2023-04-13 Thread via GitHub
tswast commented on PR #34270: URL: https://github.com/apache/arrow/pull/34270#issuecomment-1507726591 > This change adds a zero-copy casting path for durations that have equal units and timestamps that have equal units and potentially different timezones. Can you clarify this change?

[GitHub] [arrow] westonpace commented on pull request #35123: GH-35122: [C++] Add support for scalars in run_end_encode and run_end_decode

2023-04-13 Thread via GitHub
westonpace commented on PR #35123: URL: https://github.com/apache/arrow/pull/35123#issuecomment-1507723741 As far as I know, vector functions aren't used anywhere internally beyond CallFunction, so if it runs, then I think it's ok. -- This is an automated message from the Apache Git Servi

[GitHub] [arrow] danepitkin commented on pull request #34980: GH-34979: [Python] Create a base class for Table and RecordBatch

2023-04-13 Thread via GitHub
danepitkin commented on PR #34980: URL: https://github.com/apache/arrow/pull/34980#issuecomment-1507722772 I addressed the comments and also added methods `to_string()` and `__repr__` to the base class because I was testing out docstrings for `Table` vs `RecordBatch` for similarity. Sorry,

[GitHub] [arrow] ursabot commented on pull request #35057: GH-35056: [Python][CI] Don't install gdb on Windows

2023-04-13 Thread via GitHub
ursabot commented on PR #35057: URL: https://github.com/apache/arrow/pull/35057#issuecomment-1507719515 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/0f6457eb16cd4b04891a1466e48ae177...533dd7d23af149b096b13b771d1cfa7e/)

[GitHub] [arrow] ursabot commented on pull request #35057: GH-35056: [Python][CI] Don't install gdb on Windows

2023-04-13 Thread via GitHub
ursabot commented on PR #35057: URL: https://github.com/apache/arrow/pull/35057#issuecomment-1507719167 Benchmark runs are scheduled for baseline = b8427d391f77454f7009cf7b8091037fd77f01c6 and contender = 61203456ed33268df0c8c164348a203c7c1be8ca. 61203456ed33268df0c8c164348a203c7c1be8ca is

[GitHub] [arrow-datafusion] andygrove opened a new issue, #5999: Improve DataFusion scalability as more cores are added

2023-04-13 Thread via GitHub
andygrove opened a new issue, #5999: URL: https://github.com/apache/arrow-datafusion/issues/5999 ### Is your feature request related to a problem or challenge? I ran some benchmarks in constrained Docker containers and found that DataFusion is pretty close to DuckDB speed when running

[GitHub] [arrow] felipecrv commented on pull request #35123: GH-35122: [C++] Add support for scalars in run_end_encode and run_end_decode

2023-04-13 Thread via GitHub
felipecrv commented on PR #35123: URL: https://github.com/apache/arrow/pull/35123#issuecomment-1507698653 @jorisvandenbossche @benibus @zeroshade @westonpace is it OK/expected for/from vector functions to handle scalar inputs and always return an array? Because that's what I'm doing

[GitHub] [arrow] github-actions[bot] commented on pull request #35123: GH-35122: [C++] Add support for scalars in run_end_encode and run_end_decode

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #35123: URL: https://github.com/apache/arrow/pull/35123#issuecomment-1507697399 * Closes: #35122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] felipecrv opened a new pull request, #35123: GH-35122: [C++] Add support for scalars in run_end_encode and run_end_decode

2023-04-13 Thread via GitHub
felipecrv opened a new pull request, #35123: URL: https://github.com/apache/arrow/pull/35123 ### Rationale for this change To make `run_end_encode` and `run_end_decode` usable in all contexts, they should gracefully handle `Scalar` inputs. ### What changes are included in this

[GitHub] [arrow] felipecrv commented on issue #35122: [C++] Handle scalars in run_end_encode and run_end_decode kernels

2023-04-13 Thread via GitHub
felipecrv commented on issue #35122: URL: https://github.com/apache/arrow/issues/35122#issuecomment-1507691283 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [arrow] Anaisdg commented on issue #35121: FlightSqlClient Java Example

2023-04-13 Thread via GitHub
Anaisdg commented on issue #35121: URL: https://github.com/apache/arrow/issues/35121#issuecomment-1507685363 https://github.com/InfluxCommunity/Java_FlightSqlClient -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow-datafusion] NGA-TRAN commented on a diff in pull request #5982: feat: support month and year interval for date_bin on constant data

2023-04-13 Thread via GitHub
NGA-TRAN commented on code in PR #5982: URL: https://github.com/apache/arrow-datafusion/pull/5982#discussion_r1166063620 ## datafusion/core/tests/sqllogictests/test_files/timestamps.slt: ## @@ -500,6 +500,268 @@ FROM ( (TIMESTAMP '2021-06-10 17:19:10Z', TIMESTAMP '2001-01-0

[GitHub] [arrow] github-actions[bot] commented on pull request #35120: GH-35118: [FlightSQL] Use `int32` to refer to 32-bit integers rather than `int`

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #35120: URL: https://github.com/apache/arrow/pull/35120#issuecomment-1507653197 :warning: GitHub issue #35118 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] github-actions[bot] commented on pull request #35120: GH-35118: [FlightSQL] Use `int32` to refer to 32-bit integers rather than `int`

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #35120: URL: https://github.com/apache/arrow/pull/35120#issuecomment-1507653166 * Closes: #35118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] appletreeisyellow opened a new pull request, #35120: GH-35118: [FlightSQL] Use `int32` to refer to 32-bit integers rather than `int`

2023-04-13 Thread via GitHub
appletreeisyellow opened a new pull request, #35120: URL: https://github.com/apache/arrow/pull/35120 ### Rationale for this change The spec is inconsistent -- see details on #35118 ### What changes are included in this PR? Use `int32` to refer to 32-b

[GitHub] [arrow-datafusion] NGA-TRAN commented on a diff in pull request #5982: feat: support month and year interval for date_bin on constant data

2023-04-13 Thread via GitHub
NGA-TRAN commented on code in PR #5982: URL: https://github.com/apache/arrow-datafusion/pull/5982#discussion_r1166052425 ## datafusion/core/tests/sqllogictests/test_files/timestamps.slt: ## @@ -500,6 +500,118 @@ FROM ( (TIMESTAMP '2021-06-10 17:19:10Z', TIMESTAMP '2001-01-0

[GitHub] [arrow-adbc] zeroshade commented on pull request #586: WIP: feat(go/adbc/driver): Adbc Driver for Snowflake

2023-04-13 Thread via GitHub
zeroshade commented on PR #586: URL: https://github.com/apache/arrow-adbc/pull/586#issuecomment-1507644849 requires https://github.com/snowflakedb/gosnowflake/pull/769 in order to work properly -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow-adbc] github-actions[bot] commented on pull request #586: WIP: feat(go/adbc/driver): Adbc Driver for Snowflake

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #586: URL: https://github.com/apache/arrow-adbc/pull/586#issuecomment-1507644351 :warning: Please follow the [Conventional Commits format in CONTRIBUTING.md](https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md) for PR titles. -- This is an a

[GitHub] [arrow-adbc] zeroshade opened a new pull request, #586: WIP: feat(go/adbc/driver): Adbc Driver for Snowflake

2023-04-13 Thread via GitHub
zeroshade opened a new pull request, #586: URL: https://github.com/apache/arrow-adbc/pull/586 Initial work to start creating a snowflake ADBC driver which we can eventually package up like we do for the Flight SQL driver. Currently only `GetInfo` and `GetObjects` are implemented, but it's a

[GitHub] [arrow] ekt-dar commented on issue #8732: arrow::write_feather error: Capacity error: array cannot contain more than 2147483646 bytes

2023-04-13 Thread via GitHub
ekt-dar commented on issue #8732: URL: https://github.com/apache/arrow/issues/8732#issuecomment-1507637502 Any news on this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [arrow] jorisvandenbossche commented on issue #34906: [Python][C++] Crash when reading from closed RecordBatchReader backed by C Stream

2023-04-13 Thread via GitHub
jorisvandenbossche commented on issue #34906: URL: https://github.com/apache/arrow/issues/34906#issuecomment-1507625595 Thanks! Backtrace for this: ``` Thread 1 "python" received signal SIGSEGV, Segmentation fault. __pyx_pf_7pyarrow_3lib_17RecordBatchReader_6schema___get__ (__py

[GitHub] [arrow-datafusion] mustafasrepo merged pull request #5971: Temporal datatype support for interval arithmetic

2023-04-13 Thread via GitHub
mustafasrepo merged PR #5971: URL: https://github.com/apache/arrow-datafusion/pull/5971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[GitHub] [arrow-datafusion] mustafasrepo closed issue #5844: Temporal datatype support for interval arithmetic

2023-04-13 Thread via GitHub
mustafasrepo closed issue #5844: Temporal datatype support for interval arithmetic URL: https://github.com/apache/arrow-datafusion/issues/5844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow-datafusion] mustafasrepo merged pull request #5937: Streaming Memory Reservation in SHJ

2023-04-13 Thread via GitHub
mustafasrepo merged PR #5937: URL: https://github.com/apache/arrow-datafusion/pull/5937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[GitHub] [arrow-datafusion] mustafasrepo closed issue #5636: Implement memory management / limiting in `SymmetricHashJoinExec`

2023-04-13 Thread via GitHub
mustafasrepo closed issue #5636: Implement memory management / limiting in `SymmetricHashJoinExec` URL: https://github.com/apache/arrow-datafusion/issues/5636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] davisusanibar commented on a diff in pull request #34227: GH-34223: [Java] Java Substrait Consumer JNI call to ACERO C++

2023-04-13 Thread via GitHub
davisusanibar commented on code in PR #34227: URL: https://github.com/apache/arrow/pull/34227#discussion_r1166032351 ## java/dataset/src/main/cpp/jni_wrapper.cc: ## @@ -578,3 +629,96 @@ Java_org_apache_arrow_dataset_file_JniWrapper_writeFromScannerToFile( JniAssertOkOrThrow(

[GitHub] [arrow-datafusion] tustvold closed issue #5995: datafusion-cli scanning a single large parquet file uses only a single core

2023-04-13 Thread via GitHub
tustvold closed issue #5995: datafusion-cli scanning a single large parquet file uses only a single core URL: https://github.com/apache/arrow-datafusion/issues/5995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow-datafusion] tustvold merged pull request #5997: Don't use parquet file offset for file range pruning

2023-04-13 Thread via GitHub
tustvold merged PR #5997: URL: https://github.com/apache/arrow-datafusion/pull/5997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow] davisusanibar commented on a diff in pull request #34227: GH-34223: [Java] Java Substrait Consumer JNI call to ACERO C++

2023-04-13 Thread via GitHub
davisusanibar commented on code in PR #34227: URL: https://github.com/apache/arrow/pull/34227#discussion_r1166032083 ## java/dataset/src/test/java/org/apache/arrow/dataset/substrait/TestAceroSubstraitConsumer.java: ## @@ -0,0 +1,223 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [arrow] github-actions[bot] commented on pull request #35117: GH-35115: [C++] Moved util_avx2.cc from acero to compute

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #35117: URL: https://github.com/apache/arrow/pull/35117#issuecomment-1507610837 * Closes: #35115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] github-actions[bot] commented on pull request #35117: GH-35115: [C++] Moved util_avx2.cc from acero to compute

2023-04-13 Thread via GitHub
github-actions[bot] commented on PR #35117: URL: https://github.com/apache/arrow/pull/35117#issuecomment-1507610884 :warning: GitHub issue #35115 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] westonpace opened a new pull request, #35117: GH-35115: [C++] Moved util_avx2.cc from acero to compute

2023-04-13 Thread via GitHub
westonpace opened a new pull request, #35117: URL: https://github.com/apache/arrow/pull/35117 ### Rationale for this change The file util_avx2.cc contains implementation for methods defined in `src/arrow/compute/util.cc` but it was placed in `src/arrow/acero`. This leads to undefine

[GitHub] [arrow] kylebarron commented on issue #34906: [Python][C++] Crash when reading from closed RecordBatchReader backed by C Stream

2023-04-13 Thread via GitHub
kylebarron commented on issue #34906: URL: https://github.com/apache/arrow/issues/34906#issuecomment-1507606833 From https://docs.python.org/3.11/tutorial/interactive.html#tab-completion-and-history-editing > Note that this may execute application-defined code if an object with a `__

[GitHub] [arrow-datafusion] stuartcarnie commented on issue #5970: UNION ALL with ORDER BY results are inconsistent

2023-04-13 Thread via GitHub
stuartcarnie commented on issue #5970: URL: https://github.com/apache/arrow-datafusion/issues/5970#issuecomment-1507602460 > > I would argue that UnionExec should NEVER modify its inputs but just be a plain, simple node that forwards its inputs w/o messing up sorting (or any other property

[GitHub] [arrow-datafusion] Weijun-H opened a new pull request, #5998: fix: enhance error when placeholder is empty

2023-04-13 Thread via GitHub
Weijun-H opened a new pull request, #5998: URL: https://github.com/apache/arrow-datafusion/pull/5998 # Which issue does this PR close? Closes #5856 # Rationale for this change # What changes are included in this PR? # Are these changes teste

[GitHub] [arrow] kou commented on a diff in pull request #35109: GH-35101: [C++] Update deprecated LOCATION target property in ArrowConfig.cmake.in

2023-04-13 Thread via GitHub
kou commented on code in PR #35109: URL: https://github.com/apache/arrow/pull/35109#discussion_r1166008766 ## cpp/src/arrow/ArrowConfig.cmake.in: ## @@ -96,11 +96,20 @@ include("${CMAKE_CURRENT_LIST_DIR}/ArrowTargets.cmake") if(TARGET Arrow::arrow_static AND NOT TARGET Arrow:

[GitHub] [arrow-rs] tustvold merged pull request #4079: Improve JSON decoder errors (#4076)

2023-04-13 Thread via GitHub
tustvold merged PR #4079: URL: https://github.com/apache/arrow-rs/pull/4079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold closed issue #4076: Better JSON Reader Error Messages

2023-04-13 Thread via GitHub
tustvold closed issue #4076: Better JSON Reader Error Messages URL: https://github.com/apache/arrow-rs/issues/4076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [arrow] wjones127 merged pull request #35011: GH-35008: [C++] Add printers for REETestData and PageIndexReaderParam to placate Valgrind

2023-04-13 Thread via GitHub
wjones127 merged PR #35011: URL: https://github.com/apache/arrow/pull/35011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4084: feat: Support dyn_compare_scalar for Decimal256

2023-04-13 Thread via GitHub
tustvold commented on code in PR #4084: URL: https://github.com/apache/arrow-rs/pull/4084#discussion_r1166003059 ## arrow-ord/src/comparison.rs: ## @@ -6165,6 +6171,128 @@ mod tests { assert_eq!(e, r); } +#[test] +fn test_decimal256_scalar_i128() { +

[GitHub] [arrow] wjones127 merged pull request #35091: GH-35063: [CI] Fix Python requirement in C# tests

2023-04-13 Thread via GitHub
wjones127 merged PR #35091: URL: https://github.com/apache/arrow/pull/35091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold closed issue #4078: object_store: Incorrect parsing of https Path Style S3 url

2023-04-13 Thread via GitHub
tustvold closed issue #4078: object_store: Incorrect parsing of https Path Style S3 url URL: https://github.com/apache/arrow-rs/issues/4078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow-rs] tustvold merged pull request #4082: object_store: fix: Incorrect parsing of https Path Style S3 url

2023-04-13 Thread via GitHub
tustvold merged PR #4082: URL: https://github.com/apache/arrow-rs/pull/4082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] kou commented on pull request #35110: MINOR: [Dev] Add Dane Pitkin and Felipe Oliveira Carvalho to collaborators

2023-04-13 Thread via GitHub
kou commented on PR #35110: URL: https://github.com/apache/arrow/pull/35110#issuecomment-1507579396 How about documenting how to become a collaborator so that developers who doesn't work at Voltron Data can become a collaborator? -- This is an automated message from the Apache Git Service

[GitHub] [arrow] jorisvandenbossche commented on issue #34906: [Python][C++] Crash when reading from closed RecordBatchReader backed by C Stream

2023-04-13 Thread via GitHub
jorisvandenbossche commented on issue #34906: URL: https://github.com/apache/arrow/issues/34906#issuecomment-1507578939 That's a different issue I think, since this one was specific to a RecordBatchReader backed by a C Stream. And I can still reproduce this on the latest main. Do you

[GitHub] [arrow] danepitkin commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

2023-04-13 Thread via GitHub
danepitkin commented on PR #35113: URL: https://github.com/apache/arrow/pull/35113#issuecomment-1507571938 LGTM! I'll let someone with committers rights finalize this review. I believe you can ignore the appveyor error, since there's been issues on main with pytests timing out. --

[GitHub] [arrow] ursabot commented on pull request #34957: GH-34956: [Docs][Python] Add to docs the usage of the FixedShapeTensorType

2023-04-13 Thread via GitHub
ursabot commented on PR #34957: URL: https://github.com/apache/arrow/pull/34957#issuecomment-1507568036 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/439959cb62204711a0a959b2f80b4eca...0f6457eb16cd4b04891a1466e48ae177/)

[GitHub] [arrow] ursabot commented on pull request #34957: GH-34956: [Docs][Python] Add to docs the usage of the FixedShapeTensorType

2023-04-13 Thread via GitHub
ursabot commented on PR #34957: URL: https://github.com/apache/arrow/pull/34957#issuecomment-1507567250 Benchmark runs are scheduled for baseline = 07642fd5da760b2ddf922d7e7c529a260aa3f177 and contender = b8427d391f77454f7009cf7b8091037fd77f01c6. b8427d391f77454f7009cf7b8091037fd77f01c6 is

[GitHub] [arrow] kou commented on a diff in pull request #35103: GH-34907: [Docs][R] Version selector reports that release version is dev

2023-04-13 Thread via GitHub
kou commented on code in PR #35103: URL: https://github.com/apache/arrow/pull/35103#discussion_r1165987188 ## r/vignettes/prevdocs.Rmd: ## @@ -0,0 +1,12 @@ +--- +title: "Current and Older Versions of this Documentation" +output: rmarkdown::html_vignette +--- + +```{r, echo=FALSE

[GitHub] [arrow] zeroshade commented on pull request #35090: GH-35089: [CI][C++][Flight] Test failures in macos release verification nightlies

2023-04-13 Thread via GitHub
zeroshade commented on PR #35090: URL: https://github.com/apache/arrow/pull/35090#issuecomment-1507537028 I've reduced the failures at least, but I can't seem to figure out the cause of these Segfaults in the macos-amd64-conda release verification. Any assistance here would be amazing. Than

[GitHub] [arrow] trxcllnt commented on a diff in pull request #35067: GH-35067: [JavaScript] toString for signed `BigNum`s

2023-04-13 Thread via GitHub
trxcllnt commented on code in PR #35067: URL: https://github.com/apache/arrow/pull/35067#discussion_r1165970302 ## js/src/util/bn.ts: ## @@ -90,12 +90,52 @@ function bignumToNumber>(bn: T) { } /** @ignore */ -export const bignumToString: { >(a: T): string } = (>(a: T) => a.

[GitHub] [arrow] trxcllnt commented on pull request #35067: GH-35067: [JavaScript] toString for signed `BigNum`s

2023-04-13 Thread via GitHub
trxcllnt commented on PR #35067: URL: https://github.com/apache/arrow/pull/35067#issuecomment-1507535376 @aljazerzen 99% sure closure is mangling too aggressively, and we need to do `if(!a['signed'])` to defeat it. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow-rs] roeap commented on a diff in pull request #4082: object_store: fix: Incorrect parsing of https Path Style S3 url

2023-04-13 Thread via GitHub
roeap commented on code in PR #4082: URL: https://github.com/apache/arrow-rs/pull/4082#discussion_r1165965431 ## object_store/src/aws/mod.rs: ## @@ -758,12 +758,16 @@ impl AmazonS3Builder { fn parse_url(&mut self, url: &str) -> Result<()> { let parsed = Url::parse(

[GitHub] [arrow] izveigor commented on issue #35052: Different primitive types in different languages

2023-04-13 Thread via GitHub
izveigor commented on issue #35052: URL: https://github.com/apache/arrow/issues/35052#issuecomment-1507526221 I didn't accurately describe the problem, I will try to ask some questions that I did not understand. 1) I don't understand the main principe by which a type is assigned to a

[GitHub] [arrow-adbc] paleolimbot opened a new pull request, #585: fix(c/driver/sqlite,c/validation): Ensure float/double values are not truncated on bind or select

2023-04-13 Thread via GitHub
paleolimbot opened a new pull request, #585: URL: https://github.com/apache/arrow-adbc/pull/585 Closes #578 and modifies the validation suite so that it will catch this type of accidental cast should it happen again. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4082: object_store: fix: Incorrect parsing of https Path Style S3 url

2023-04-13 Thread via GitHub
tustvold commented on code in PR #4082: URL: https://github.com/apache/arrow-rs/pull/4082#discussion_r1165961414 ## object_store/src/aws/mod.rs: ## @@ -758,12 +758,16 @@ impl AmazonS3Builder { fn parse_url(&mut self, url: &str) -> Result<()> { let parsed = Url::par

[GitHub] [arrow-rs] roeap commented on pull request #4082: object_store: fix: Incorrect parsing of https Path Style S3 url

2023-04-13 Thread via GitHub
roeap commented on PR #4082: URL: https://github.com/apache/arrow-rs/pull/4082#issuecomment-1507520314 We are now also checking if the first path segment exists, and assigning it to the bucket. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] wjones127 commented on a diff in pull request #33634: GH-20385: [C++] reject partial loads of an extension type in the parquet reader

2023-04-13 Thread via GitHub
wjones127 commented on code in PR #33634: URL: https://github.com/apache/arrow/pull/33634#discussion_r1165957392 ## cpp/src/parquet/arrow/reader.cc: ## @@ -842,7 +842,16 @@ Status GetReader(const SchemaField& field, const std::shared_ptr& arrow_f auto storage_field = arrow

[GitHub] [arrow-datafusion] alamb commented on pull request #5937: Streaming Memory Reservation in SHJ

2023-04-13 Thread via GitHub
alamb commented on PR #5937: URL: https://github.com/apache/arrow-datafusion/pull/5937#issuecomment-1507504622 @mustafasrepo perhaps you can merge this when you re ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5997: Don't use parquet file offset for file range pruning

2023-04-13 Thread via GitHub
alamb commented on code in PR #5997: URL: https://github.com/apache/arrow-datafusion/pull/5997#discussion_r1165940245 ## datafusion/core/src/physical_plan/file_format/parquet/row_groups.rs: ## @@ -53,7 +53,11 @@ pub(crate) fn prune_row_groups( let mut filtered = Vec::with_c

[GitHub] [arrow-datafusion] alamb commented on pull request #5997: Don't use parquet file offset for file range pruning

2023-04-13 Thread via GitHub
alamb commented on PR #5997: URL: https://github.com/apache/arrow-datafusion/pull/5997#issuecomment-1507488008 I will give this a test on my performance machine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow-datafusion] alamb merged pull request #5987: Update prost-build requirement from =0.11.8 to =0.11.9

2023-04-13 Thread via GitHub
alamb merged PR #5987: URL: https://github.com/apache/arrow-datafusion/pull/5987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb closed issue #5788: Incorrect metrics for ParquetExec

2023-04-13 Thread via GitHub
alamb closed issue #5788: Incorrect metrics for ParquetExec URL: https://github.com/apache/arrow-datafusion/issues/5788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [arrow-datafusion] alamb merged pull request #5898: Minor: Improve doc comments in FileStream

2023-04-13 Thread via GitHub
alamb merged PR #5898: URL: https://github.com/apache/arrow-datafusion/pull/5898 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb closed pull request #5790: Revert pr #5020 (Parquet scan time metrics)

2023-04-13 Thread via GitHub
alamb closed pull request #5790: Revert pr #5020 (Parquet scan time metrics) URL: https://github.com/apache/arrow-datafusion/pull/5790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5982: feat: support month and year interval for date_bin on constant data

2023-04-13 Thread via GitHub
alamb commented on code in PR #5982: URL: https://github.com/apache/arrow-datafusion/pull/5982#discussion_r1165928265 ## datafusion/physical-expr/src/datetime_expressions.rs: ## @@ -366,6 +432,17 @@ pub fn date_bin(args: &[ColumnarValue]) -> Result { } } +enum Interval

[GitHub] [arrow-rs] rtyler commented on issue #4075: Parquet reader of Int96 columns and coercion to timestamps

2023-04-13 Thread via GitHub
rtyler commented on issue #4075: URL: https://github.com/apache/arrow-rs/issues/4075#issuecomment-1507480870 [This link to Apache Spark](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L970-L980) code was shared with me, an

[GitHub] [arrow-rs] rtyler closed issue #4075: Parquet reader of Int96 columns and coercion to timestamps

2023-04-13 Thread via GitHub
rtyler closed issue #4075: Parquet reader of Int96 columns and coercion to timestamps URL: https://github.com/apache/arrow-rs/issues/4075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] alippai commented on issue #5942: Poor reported performance of DataFusion against DuckDB and Hyper

2023-04-13 Thread via GitHub
alippai commented on issue #5942: URL: https://github.com/apache/arrow-datafusion/issues/5942#issuecomment-1507478686 > #5997 should help with this, the file range logic was partitioning row groups based on the location of their ColumnMetadata, which is normally written at the end of a Col

  1   2   3   4   >