[GitHub] [arrow-rs] askoa commented on a diff in pull request #3603: Add ArrayAccessor, Iterator, Extend and benchmarks for RunArray

2023-01-31 Thread via GitHub
askoa commented on code in PR #3603: URL: https://github.com/apache/arrow-rs/pull/3603#discussion_r1092863125 ## arrow-array/src/array/run_array.rs: ## @@ -274,15 +296,191 @@ pub type Int32RunArray = RunArray; /// ``` pub type Int64RunArray = RunArray; +/// A strongly-typed

[GitHub] [arrow] kou commented on pull request #14561: GH-20484: [Swift] Initial Arrow implementation

2023-01-31 Thread via GitHub
kou commented on PR #14561: URL: https://github.com/apache/arrow/pull/14561#issuecomment-1411581763 We can merge this after we remove needless Ubuntu versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [arrow] ursabot commented on pull request #33902: GH-33901: [Go] Add a malloc-based allocator

2023-01-31 Thread via GitHub
ursabot commented on PR #33902: URL: https://github.com/apache/arrow/pull/33902#issuecomment-1411577729 Benchmark runs are scheduled for baseline = 9fc0efae8115ee461cff3901883f3fd0f97606cf and contender = 3affadad5de97005ef587b95638ae4693a7eed1f. 3affadad5de97005ef587b95638ae4693a7eed1f is

[GitHub] [arrow] kou commented on pull request #33890: GH-33851: [C++] Update bundled boost version

2023-01-31 Thread via GitHub
kou commented on PR #33890: URL: https://github.com/apache/arrow/pull/33890#issuecomment-1411557747 > Why "7.0.0"? Does this version number have a special meaning? It doesn't have a special meaning. It’s just for historical reason. We needed to create a release page to use GitHub as

[GitHub] [arrow-datafusion] jiacai2050 commented on pull request #5124: Add option to control whether to normalize ident

2023-01-31 Thread via GitHub
jiacai2050 commented on PR #5124: URL: https://github.com/apache/arrow-datafusion/pull/5124#issuecomment-1411550768 @alamb parse_float_as_decimal already has its UT: https://github.com/apache/arrow-datafusion/blob/7302120b00f0b8674fe0f5a544233516156c8af6/datafusion/sql/tests/integrat

[GitHub] [arrow] kou commented on a diff in pull request #14561: GH-20484: [Swift] Initial Arrow implementation

2023-01-31 Thread via GitHub
kou commented on code in PR #14561: URL: https://github.com/apache/arrow/pull/14561#discussion_r1092809344 ## docker-compose.yml: ## @@ -869,6 +870,23 @@ services: volumes: *ubuntu-volumes command: *python-command + ubuntu-swift: +# Usage: +# docker-compos

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #5133: Support StreamAggregation

2023-01-31 Thread via GitHub
Ted-Jiang commented on issue #5133: URL: https://github.com/apache/arrow-datafusion/issues/5133#issuecomment-1411531804 Our team is also looking forward to this feature and the memory limited aggregation πŸ‘ -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] mapleFU commented on pull request #33897: GH-33652: [C++][Parquet] Add interface total_compressed_bytes_written

2023-01-31 Thread via GitHub
mapleFU commented on PR #33897: URL: https://github.com/apache/arrow/pull/33897#issuecomment-1411526998 > Yes, I think adding the comment should help understand. I wish these parts of the codebase were better documented. > > Are you still planning on adding tests? Yes, I will,

[GitHub] [arrow] maosuhan commented on pull request #14913: GH-14912: [Java] Remove usage of PlatformDependent in arrow-vector, arrow-jdbc and arrow-algorithm

2023-01-31 Thread via GitHub
maosuhan commented on PR #14913: URL: https://github.com/apache/arrow/pull/14913#issuecomment-1411520269 How about merging this MR first? We are also blocked by this netty unsafe issue. @lidavidm -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow] aiguofer commented on issue #33953: [Java] JDBC driver does not send custom headers on DoGet

2023-01-31 Thread via GitHub
aiguofer commented on issue #33953: URL: https://github.com/apache/arrow/issues/33953#issuecomment-1411515299 After looking at it, it looks like all the stubs are generated from the `interceptedChannel` so there shouldn't be any issues with doing this everywhere. Submitted a PR. -- This

[GitHub] [arrow] aiguofer opened a new pull request, #33967: GH-33953: [Java] Pass custom headers on every request

2023-01-31 Thread via GitHub
aiguofer opened a new pull request, #33967: URL: https://github.com/apache/arrow/pull/33967 ### Rationale for this change Some flight requests don't send custom headers. This PR should fix that. ### What changes are included in this PR? Ensure custom heade

[GitHub] [arrow] aiguofer commented on issue #33953: [Java] JDBC driver does not send custom headers on DoGet

2023-01-31 Thread via GitHub
aiguofer commented on issue #33953: URL: https://github.com/apache/arrow/issues/33953#issuecomment-1411497672 Ok so this one is a little bit beyond me as I'm not very familiar with gRPC but I believe the problem lies in `FlightClient.interceptedChannel`. The handshake and sql info me

[GitHub] [arrow] rtpsw commented on pull request #33909: GH-33899: [C++] Add NamedTapRel relation as a Substrait extension

2023-01-31 Thread via GitHub
rtpsw commented on PR #33909: URL: https://github.com/apache/arrow/pull/33909#issuecomment-1411491232 Note that the factory registry does not support scoping, which could be useful for cleaning up after and isolating testing (here, due to `AddPassFactory`) and in other use cases, like with

[GitHub] [arrow] cboettig commented on pull request #33918: GH-33904: [R] improve behavior of s3_bucket

2023-01-31 Thread via GitHub
cboettig commented on PR #33918: URL: https://github.com/apache/arrow/pull/33918#issuecomment-1411487681 Thanks for the discussion! @amoeba I definitely agree with your point about higher-level function and breaking changes. How is region guessing handled in arrow's other client interface

[GitHub] [arrow] ursabot commented on pull request #24372: GH-14863: [C++] Add appender functions to array builders that can take optionals

2023-01-31 Thread via GitHub
ursabot commented on PR #24372: URL: https://github.com/apache/arrow/pull/24372#issuecomment-1411450608 Benchmark runs are scheduled for baseline = 61f3cdfd704c3e65639a7e22fec49f25d9df6440 and contender = 9fc0efae8115ee461cff3901883f3fd0f97606cf. 9fc0efae8115ee461cff3901883f3fd0f97606cf is

[GitHub] [arrow] avantgardnerio commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092744601 ## java/flight/flight-sql-jdbc-driver/src/main/java/org/apache/arrow/driver/jdbc/ArrowFlightPreparedStatement.java: ## @@ -93,6 +104,17 @@ public synchronized vo

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #5057: Parquet parallel scan

2023-01-31 Thread via GitHub
Ted-Jiang commented on PR #5057: URL: https://github.com/apache/arrow-datafusion/pull/5057#issuecomment-1411432296 Thanks for all kindly reply ! ❀️ > here is the [exact place](https://github.com/apache/arrow-datafusion/blob/125a8580c19c78c99fbbe3a6afe373de2538b205/datafusion/core/

[GitHub] [arrow-rs] dnsco opened a new issue, #3644: A trait for append_value and append_null on ArrayBuilders

2023-01-31 Thread via GitHub
dnsco opened a new issue, #3644: URL: https://github.com/apache/arrow-rs/issues/3644 I'm generating an arrow schema, and because there is not a macro, when it's time set the values I need to invoke a macro like this: ```rust macro_rules! set_value { ($builder:expr,$i:expr,$typ:t

[GitHub] [arrow] aakshintala commented on issue #33966: The ArrowIPC Java library doesn't respect buffer offsets set in the flatbuffer metadata

2023-01-31 Thread via GitHub
aakshintala commented on issue #33966: URL: https://github.com/apache/arrow/issues/33966#issuecomment-1411418157 Specifically: What my library produces: > ArrowRecordBatch [length=64, nodes=[ArrowFieldNode [length=64, nullCount=0]], #buffers=2, buffersLayout=[ArrowBuffer [offset=0,

[GitHub] [arrow] aakshintala opened a new issue, #33966: The ArrowIPC Java library doesn't respect buffer offsets set in the flatbuffer metadata

2023-01-31 Thread via GitHub
aakshintala opened a new issue, #33966: URL: https://github.com/apache/arrow/issues/33966 ### Describe the bug, including details regarding any error messages, version, and platform. The `Buffer Alignment and Padding` [section](https://arrow.apache.org/docs/format/Columnar.html#buffe

[GitHub] [arrow] rok commented on pull request #8510: GH-15483: [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2023-01-31 Thread via GitHub
rok commented on PR #8510: URL: https://github.com/apache/arrow/pull/8510#issuecomment-1411404748 Thanks for the review @jorisvandenbossche. I've pushed some more changes and I'll start working on cython next. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] rok commented on a diff in pull request #8510: GH-15483: [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2023-01-31 Thread via GitHub
rok commented on code in PR #8510: URL: https://github.com/apache/arrow/pull/8510#discussion_r1092724410 ## cpp/src/arrow/extension/tensor_array.cc: ## @@ -0,0 +1,161 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. S

[GitHub] [arrow] rok commented on a diff in pull request #8510: GH-15483: [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2023-01-31 Thread via GitHub
rok commented on code in PR #8510: URL: https://github.com/apache/arrow/pull/8510#discussion_r1092723728 ## cpp/src/arrow/extension_type_test.cc: ## @@ -333,4 +335,82 @@ TEST_F(TestExtensionType, ValidateExtensionArray) { ASSERT_OK(ext_arr4->ValidateFull()); } +TEST_F(Test

[GitHub] [arrow] rok commented on a diff in pull request #8510: GH-15483: [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2023-01-31 Thread via GitHub
rok commented on code in PR #8510: URL: https://github.com/apache/arrow/pull/8510#discussion_r1092723408 ## cpp/src/arrow/extension/tensor_array.cc: ## @@ -0,0 +1,161 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. S

[GitHub] [arrow] rok commented on a diff in pull request #8510: GH-15483: [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2023-01-31 Thread via GitHub
rok commented on code in PR #8510: URL: https://github.com/apache/arrow/pull/8510#discussion_r1092723068 ## cpp/src/arrow/extension/tensor_array.cc: ## @@ -0,0 +1,161 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. S

[GitHub] [arrow] kou commented on issue #33920: [C++][CI] C++ / AMD64 Ubuntu 22.04 C++ ASAN UBSAN (pull_request) Cancelled after 75m

2023-01-31 Thread via GitHub
kou commented on issue #33920: URL: https://github.com/apache/arrow/issues/33920#issuecomment-1411394914 If we disable Flight SQL, we can use system gRPC and ProtoBuf on Ubuntu 22.04. Flight SQL requires ProtoBuf 3.15.0 or later: https://github.com/apache/arrow/blob/master/cpp/cmake_

[GitHub] [arrow-datafusion] ozankabak commented on issue #5133: Support StreamAggregation

2023-01-31 Thread via GitHub
ozankabak commented on issue #5133: URL: https://github.com/apache/arrow-datafusion/issues/5133#issuecomment-1411392918 Thank you for putting this together, had an initial look. I expect us to do a deeper dive, take a look at the mentioned papers, and give meaningful comments in the next s

[GitHub] [arrow-ballista] yahoNanJing commented on pull request #586: Handle job resubmission

2023-01-31 Thread via GitHub
yahoNanJing commented on PR #586: URL: https://github.com/apache/arrow-ballista/pull/586#issuecomment-1411365835 Thanks @thinkharderdev for the patching. Besides adding a testing case, how about making the sleeping time configurable? πŸ€” -- This is an automated message from the Apache Git

[GitHub] [arrow] kou commented on a diff in pull request #33808: GH-20272: [C++] Bump version of bundled AWS SDK

2023-01-31 Thread via GitHub
kou commented on code in PR #33808: URL: https://github.com/apache/arrow/pull/33808#discussion_r1092694995 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -4714,35 +4798,129 @@ macro(build_awssdk) DEPENDS aws_c_common_ep) add_dependencies(AWS::aws

[GitHub] [arrow] amoeba commented on pull request #33918: GH-33904: [R] improve behavior of s3_bucket

2023-01-31 Thread via GitHub
amoeba commented on PR #33918: URL: https://github.com/apache/arrow/pull/33918#issuecomment-1411355499 Thanks @paleolimbot. Here are two examples: **Example One** ```r s3_bucket("my_bucket", endpoint_override = "http://example.com";) ``` - **Current behavior:** Attempts

[GitHub] [arrow] kou commented on a diff in pull request #33808: GH-20272: [C++] Bump version of bundled AWS SDK

2023-01-31 Thread via GitHub
kou commented on code in PR #33808: URL: https://github.com/apache/arrow/pull/33808#discussion_r1092682530 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -455,6 +455,93 @@ else() ) endif() +if(DEFINED ENV{ARROW_AWS_C_AUTH_URL}) + set(AWS_C_AUTH_SOURCE_URL "$ENV{ARR

[GitHub] [arrow-datafusion] unconsolable commented on issue #4850: Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally)

2023-01-31 Thread via GitHub
unconsolable commented on issue #4850: URL: https://github.com/apache/arrow-datafusion/issues/4850#issuecomment-1411321368 IMHO, these are table functions. I wonder does datafusion support table function now? ref. https://github.com/apache/arrow-datafusion/issues/3773 -- This is an au

[GitHub] [arrow] ursabot commented on pull request #33946: GH-33874: [Java] Ensure custom headers are included during JDBC auth handshake

2023-01-31 Thread via GitHub
ursabot commented on PR #33946: URL: https://github.com/apache/arrow/pull/33946#issuecomment-1411320611 Benchmark runs are scheduled for baseline = a0e2d65e953171e599e002f5bc989dc6e8b6ee16 and contender = 61f3cdfd704c3e65639a7e22fec49f25d9df6440. 61f3cdfd704c3e65639a7e22fec49f25d9df6440 is

[GitHub] [arrow] kou commented on pull request #33808: GH-20272: [C++] Bump version of bundled AWS SDK

2023-01-31 Thread via GitHub
kou commented on PR #33808: URL: https://github.com/apache/arrow/pull/33808#issuecomment-1411319527 > > If we use dynamic linking for bundled libraries, we need to install them but it may overwrite existing libraries. > > Well, yes, but so what? People who are concerned about that sho

[GitHub] [arrow-datafusion] unconsolable commented on issue #4850: Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally)

2023-01-31 Thread via GitHub
unconsolable commented on issue #4850: URL: https://github.com/apache/arrow-datafusion/issues/4850#issuecomment-1411319279 IMHO, these are table functions. I wonder does datafusion support table function now? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] vibhatha commented on pull request #33964: GH-33963: [C++] add missing arrow/engine headers

2023-01-31 Thread via GitHub
vibhatha commented on PR #33964: URL: https://github.com/apache/arrow/pull/33964#issuecomment-1411312616 @westonpace thanks for this update. I think it is better to work on https://github.com/apache/arrow/issues/33956 after merging this. -- This is an automated message from the Apache Git

[GitHub] [arrow-datafusion] kmitchener commented on pull request #5138: add example for Flight SQL Server that supports JDBC driver

2023-01-31 Thread via GitHub
kmitchener commented on PR #5138: URL: https://github.com/apache/arrow-datafusion/pull/5138#issuecomment-1411309783 ooh, foolish. this example creates a new SessionContext on every query. Moving to draft until I fix it. -- This is an automated message from the Apache Git Service. To resp

[GitHub] [arrow] minyoung commented on issue #33875: [Go] How to handle large lists?

2023-01-31 Thread via GitHub
minyoung commented on issue #33875: URL: https://github.com/apache/arrow/issues/33875#issuecomment-1411308030 @zeroshade I had a go at implementing support for LargeString/LargeBinary: https://github.com/apache/arrow/pull/33965 A potential gotcha I stumbled on though is that while we

[GitHub] [arrow] minyoung opened a new pull request, #33965: GH-33875: [Go] Handle writing LargeString and LargeBinary types

2023-01-31 Thread via GitHub
minyoung opened a new pull request, #33965: URL: https://github.com/apache/arrow/pull/33965 ### Rationale for this change Handle writing `array.LargeString` and `array.LargeBinary` data types. This allows parquet files to contain more than 2G worth of binary data in a single

[GitHub] [arrow] kou commented on pull request #33822: GH-32613: [C++] Simplify IPC writer for dense unions

2023-01-31 Thread via GitHub
kou commented on PR #33822: URL: https://github.com/apache/arrow/pull/33822#issuecomment-1411290277 Thanks. It seems that we have a test for this case. Could you revert the bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] westonpace commented on issue #33960: [R] Output schema for aggregation is sometimes innacurate

2023-01-31 Thread via GitHub
westonpace commented on issue #33960: URL: https://github.com/apache/arrow/issues/33960#issuecomment-1411272665 Would `arrow::compute::DeclarationToSchema` help? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow-datafusion] comphead commented on pull request #5140: Date to Timestamp cast

2023-01-31 Thread via GitHub
comphead commented on PR #5140: URL: https://github.com/apache/arrow-datafusion/pull/5140#issuecomment-1411286032 @liukun4515 @crepererum @alamb please check this one. The downside for now is additional cast created sometimes for timestamp, to convert `Timestamp(_, Some("00:00"))` to

[GitHub] [arrow] westonpace commented on issue #33956: [C++] Improve Substrait consumer example

2023-01-31 Thread via GitHub
westonpace commented on issue #33956: URL: https://github.com/apache/arrow/issues/33956#issuecomment-1411284124 > That file does not exist when I build and install the Arrow library (using a recent dev commit) : Oops. https://github.com/apache/arrow/pull/33964 -- This is an automa

[GitHub] [arrow-datafusion] comphead opened a new pull request, #5140: Date to Timestamp cast

2023-01-31 Thread via GitHub
comphead opened a new pull request, #5140: URL: https://github.com/apache/arrow-datafusion/pull/5140 # Which issue does this PR close? Closes #4761 #4644 . # Rationale for this change Introduce casting between dates and timestamps datatypes. Currently DF just panics

[GitHub] [arrow] westonpace opened a new issue, #33963: [C++] arrow/engine/api.h is not included in install

2023-01-31 Thread via GitHub
westonpace opened a new issue, #33963: URL: https://github.com/apache/arrow/issues/33963 ### Describe the bug, including details regarding any error messages, version, and platform. arrow/engine/substrait/api.h is included but we want to keep arrow/engine/api.h as well based on recen

[GitHub] [arrow] westonpace commented on pull request #33886: GH-33699: [C++] Increase timeout of c++ tests when running under valgrind and shorten long tests

2023-01-31 Thread via GitHub
westonpace commented on PR #33886: URL: https://github.com/apache/arrow/pull/33886#issuecomment-1411269089 @js8544 's suggestion worked. Looks like the remaining test failures are spurious. Any last thoughts @assignUser ? -- This is an automated message from the Apache Git Service. To r

[GitHub] [arrow-datafusion] kmitchener commented on pull request #5138: add example for Flight SQL Server that supports JDBC driver

2023-01-31 Thread via GitHub
kmitchener commented on PR #5138: URL: https://github.com/apache/arrow-datafusion/pull/5138#issuecomment-1411255283 > @avantgardnerio may be interested in this I heavily referenced the Ballista implementation (and the arrow-rs Flight SQL example) so thank you @avantgardnerio! I linke

[GitHub] [arrow-datafusion] ursabot commented on pull request #5135: Add in-list test

2023-01-31 Thread via GitHub
ursabot commented on PR #5135: URL: https://github.com/apache/arrow-datafusion/pull/5135#issuecomment-1411244718 Benchmark runs are scheduled for baseline = d59b6dd563e3a903fae62606371e1b6f3eda53dc and contender = 4c21a72c075729bb8c731c551e5537c9e01d3231. 4c21a72c075729bb8c731c551e5537c9e

[GitHub] [arrow-datafusion] andygrove merged pull request #5135: Add in-list test

2023-01-31 Thread via GitHub
andygrove merged PR #5135: URL: https://github.com/apache/arrow-datafusion/pull/5135 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] andygrove commented on pull request #5138: add example for Flight SQL Server that supports JDBC driver

2023-01-31 Thread via GitHub
andygrove commented on PR #5138: URL: https://github.com/apache/arrow-datafusion/pull/5138#issuecomment-1411236097 @avantgardnerio may be interested in this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #5137: Add cast expression with bool, integers and decimal128 support

2023-01-31 Thread via GitHub
andygrove commented on code in PR #5137: URL: https://github.com/apache/arrow-datafusion/pull/5137#discussion_r1092610290 ## datafusion/substrait/src/consumer.rs: ## @@ -679,12 +680,46 @@ pub async fn from_substrait_rex( } } } +Some

[GitHub] [arrow-datafusion] andygrove closed issue #5134: Support for InList in datafusion-substrait

2023-01-31 Thread via GitHub
andygrove closed issue #5134: Support for InList in datafusion-substrait URL: https://github.com/apache/arrow-datafusion/issues/5134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow] kou commented on a diff in pull request #33949: MINOR: [Release][Docs] Add clarification on some post release steps documentation

2023-01-31 Thread via GitHub
kou commented on code in PR #33949: URL: https://github.com/apache/arrow/pull/33949#discussion_r1092603432 ## dev/release/post-08-docs.sh: ## @@ -103,4 +103,5 @@ if [ ${PUSH} -gt 0 ]; then echo "Success!" echo "Create a pull request:" echo " ${github_url}/pull/new/${br

[GitHub] [arrow] wjones127 commented on a diff in pull request #14351: GH-33115: [C++] Parquet Implement crc in reading and writing Page for DATA_PAGE (v1)

2023-01-31 Thread via GitHub
wjones127 commented on code in PR #14351: URL: https://github.com/apache/arrow/pull/14351#discussion_r1092565925 ## cpp/src/arrow/util/crc32.cc: ## @@ -0,0 +1,961 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See t

[GitHub] [arrow] wjones127 commented on pull request #14351: GH-33115: [C++] Parquet Implement crc in reading and writing Page for DATA_PAGE (v1)

2023-01-31 Thread via GitHub
wjones127 commented on PR #14351: URL: https://github.com/apache/arrow/pull/14351#issuecomment-1411216110 This looks almost finish. Could you please add unit tests for the crc32 function itself, as [Antoine asked](https://github.com/apache/arrow/pull/14351#discussion_r1082367597)? At a min

[GitHub] [arrow-datafusion] alamb commented on issue #1570: Memory Limited GroupBy (Externalized / Spill)

2023-01-31 Thread via GitHub
alamb commented on issue #1570: URL: https://github.com/apache/arrow-datafusion/issues/1570#issuecomment-1411214331 > I'd be interested. after a big sleep I think I get your approach, but if you can produce a diagram it would be great. @milenkovicm very belatedly, here is a document

[GitHub] [arrow-datafusion] alamb commented on issue #5133: Support StreamAggregation

2023-01-31 Thread via GitHub
alamb commented on issue #5133: URL: https://github.com/apache/arrow-datafusion/issues/5133#issuecomment-1411212034 Here is a google doc with some ideas https://docs.google.com/document/d/16rm5VR1nGkY6DedMCh1NUmThwf3RduAweaBH9b1h6AY/edit?usp=sharing I have it in "comment" mode for ev

[GitHub] [arrow-datafusion] kmitchener commented on pull request #5109: when inferring the schema of compressed CSV, decompress before newline-delimited chunking

2023-01-31 Thread via GitHub
kmitchener commented on PR #5109: URL: https://github.com/apache/arrow-datafusion/pull/5109#issuecomment-1411205565 Will do! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [arrow-datafusion-python] andygrove commented on issue #157: Out of memory when sorting

2023-01-31 Thread via GitHub
andygrove commented on issue #157: URL: https://github.com/apache/arrow-datafusion-python/issues/157#issuecomment-1411203031 We need to expose the memory/disk config in the Python bindings so that they can be set when creating the SessionContext. Here is Rust code for reference. I w

[GitHub] [arrow-datafusion] andygrove commented on issue #5108: Out of memory when sorting

2023-01-31 Thread via GitHub
andygrove commented on issue #5108: URL: https://github.com/apache/arrow-datafusion/issues/5108#issuecomment-1411201919 > sorry, how do you pass those config using Python API We will need to expose them there. Should be trivial. I will add notes on the Python issue -- This is an a

[GitHub] [arrow-datafusion] kmitchener opened a new issue, #5139: add example for standalone DataFusion server which supports Arrow Flight SQL JDBC driver

2023-01-31 Thread via GitHub
kmitchener opened a new issue, #5139: URL: https://github.com/apache/arrow-datafusion/issues/5139 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Being able to connect to a standalone DataFusion server with DBeaver is pretty great

[GitHub] [arrow-datafusion] kmitchener opened a new pull request, #5138: add example for Flight SQL Server that supports JDBC driver

2023-01-31 Thread via GitHub
kmitchener opened a new pull request, #5138: URL: https://github.com/apache/arrow-datafusion/pull/5138 # Which issue does this PR close? Closes #. # Rationale for this change Adds an example for a standalone DataFusion server that can execute queries from

[GitHub] [arrow] mariosasko commented on pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

2023-01-31 Thread via GitHub
mariosasko commented on PR #33925: URL: https://github.com/apache/arrow/pull/33925#issuecomment-1411183912 @jorisvandenbossche > I noticed that in the HuggingFace Datasets implementation, there is a [comment](https://github.com/huggingface/datasets/blob/5b793dd8c43bf6e85f165238becb3c

[GitHub] [arrow-datafusion] djouallah commented on issue #5108: Out of memory when sorting

2023-01-31 Thread via GitHub
djouallah commented on issue #5108: URL: https://github.com/apache/arrow-datafusion/issues/5108#issuecomment-1411179418 sorry, how do you pass those config using Python API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] rok commented on pull request #14547: ARROW-17832: [Python] Construct MapArray from sequence of dicts (instead of list of tuples)

2023-01-31 Thread via GitHub
rok commented on PR #14547: URL: https://github.com/apache/arrow/pull/14547#issuecomment-1411179137 > When will this fix be in PyPi? Need the new version 12.0.0 to be installable using `pip` please pray Releases are roughly every three months and 11.0.0 was released ~week ago. -- T

[GitHub] [arrow] ursabot commented on pull request #33941: MINOR: [C++] Fix a stale comment and declare a function constexpr

2023-01-31 Thread via GitHub
ursabot commented on PR #33941: URL: https://github.com/apache/arrow/pull/33941#issuecomment-1411176893 Benchmark runs are scheduled for baseline = 8e6c8015303f4b977f56f2a2ad652bb9cda0d240 and contender = a0e2d65e953171e599e002f5bc989dc6e8b6ee16. a0e2d65e953171e599e002f5bc989dc6e8b6ee16 is

[GitHub] [arrow] vibhatha commented on issue #33956: [C++] Improve Substrait consumer example

2023-01-31 Thread via GitHub
vibhatha commented on issue #33956: URL: https://github.com/apache/arrow/issues/33956#issuecomment-1411161640 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [arrow] paleolimbot commented on pull request #33918: GH-33904: [R] improve behavior of s3_bucket

2023-01-31 Thread via GitHub
paleolimbot commented on PR #33918: URL: https://github.com/apache/arrow/pull/33918#issuecomment-1411159871 Thank you both for working on this...I'm happy to review R code and contribute given some consensus on what the desired behaviour is. I wonder if there are some reproducible examples

[GitHub] [arrow] mroeschke opened a new issue, #33962: ENH: Support Temporal Extraction Functions for duration types

2023-01-31 Thread via GitHub
mroeschke opened a new issue, #33962: URL: https://github.com/apache/arrow/issues/33962 ### Describe the enhancement requested ``` In [2]: import datetime In [3]: import pyarrow as pa In [4]: pa.__version__ Out[4]: '10.0.1' In [5]: pa.compute.day(pa.array([da

[GitHub] [arrow] wjones127 commented on pull request #33897: GH-33652: [C++][Parquet] Add interface total_compressed_bytes_written

2023-01-31 Thread via GitHub
wjones127 commented on PR #33897: URL: https://github.com/apache/arrow/pull/33897#issuecomment-1411143605 Yes, I think adding the comment should help understand. I wish these parts of the codebase were better documented. Are you still planning on adding tests? -- This is an automat

[GitHub] [arrow-datafusion] alamb commented on issue #5133: Support StreamAggregation

2023-01-31 Thread via GitHub
alamb commented on issue #5133: URL: https://github.com/apache/arrow-datafusion/issues/5133#issuecomment-1411128220 I will begin a google doc for us to collaborate on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [arrow-datafusion] ursabot commented on pull request #5131: Bug fix: Empty Record Batch handling

2023-01-31 Thread via GitHub
ursabot commented on PR #5131: URL: https://github.com/apache/arrow-datafusion/pull/5131#issuecomment-1411126111 Benchmark runs are scheduled for baseline = abeb4fe9b516976f59b421cf886014a08bc930c0 and contender = d59b6dd563e3a903fae62606371e1b6f3eda53dc. d59b6dd563e3a903fae62606371e1b6f3

[GitHub] [arrow-datafusion] alamb commented on issue #4850: Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally)

2023-01-31 Thread via GitHub
alamb commented on issue #4850: URL: https://github.com/apache/arrow-datafusion/issues/4850#issuecomment-1411125420 I would love to see a function like `read_csv` or maybe `read_file('filename')` πŸ‘ -- This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [arrow-datafusion] alamb commented on pull request #5131: Bug fix: Empty Record Batch handling

2023-01-31 Thread via GitHub
alamb commented on PR #5131: URL: https://github.com/apache/arrow-datafusion/pull/5131#issuecomment-1411124233 Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] alamb closed issue #5090: Window function error: InvalidArgumentError("number of columns(27) must match number of fields(35) in schema"

2023-01-31 Thread via GitHub
alamb closed issue #5090: Window function error: InvalidArgumentError("number of columns(27) must match number of fields(35) in schema" URL: https://github.com/apache/arrow-datafusion/issues/5090 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow-datafusion] alamb merged pull request #5131: Bug fix: Empty Record Batch handling

2023-01-31 Thread via GitHub
alamb merged PR #5131: URL: https://github.com/apache/arrow-datafusion/pull/5131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow] amoeba commented on pull request #33918: GH-33904: [R] improve behavior of s3_bucket

2023-01-31 Thread via GitHub
amoeba commented on PR #33918: URL: https://github.com/apache/arrow/pull/33918#issuecomment-1411121662 Hey @cboettig, I'm not sure what the best course of action is here but thank you for the PR and discussion. I think we have to keep breaking changes in mind, which makes this more challeng

[GitHub] [arrow] lidavidm commented on pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
lidavidm commented on PR #33961: URL: https://github.com/apache/arrow/pull/33961#issuecomment-1411120115 note there was a prior PR in the issue description, I don't know if that answers any of your questions here -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [arrow-adbc] jmao-denver opened a new issue, #406: Cursor.rowcount of the Postgres driver returns -1 for a simple 'select' SQL cmd

2023-01-31 Thread via GitHub
jmao-denver opened a new issue, #406: URL: https://github.com/apache/arrow-adbc/issues/406 This is of course still in compliance with the DB API 2.0 spec but would be nice for it to be set to the actual number of rows produced. -- This is an automated message from the Apache Git Service.

[GitHub] [arrow] alamb commented on pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
alamb commented on PR #33961: URL: https://github.com/apache/arrow/pull/33961#issuecomment-147712 I also happen to be working on support for parameterized statements in InfluxDB IOx (WIP at https://github.com/influxdata/influxdb_iox/pull/6790) and I had just hit this. I plan to try and

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-31 Thread via GitHub
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1092500261 ## cpp/src/arrow/array/validate.cc: ## @@ -622,6 +637,106 @@ struct ValidateArrayImpl { return Status::OK(); } + template + Status ValidateRunEndEncoded(

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-31 Thread via GitHub
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1092501101 ## cpp/src/arrow/array/validate.cc: ## @@ -622,6 +637,106 @@ struct ValidateArrayImpl { return Status::OK(); } + template + Status ValidateRunEndEncoded(

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-31 Thread via GitHub
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1092500261 ## cpp/src/arrow/array/validate.cc: ## @@ -622,6 +637,106 @@ struct ValidateArrayImpl { return Status::OK(); } + template + Status ValidateRunEndEncoded(

[GitHub] [arrow] tommydangerous commented on pull request #14547: ARROW-17832: [Python] Construct MapArray from sequence of dicts (instead of list of tuples)

2023-01-31 Thread via GitHub
tommydangerous commented on PR #14547: URL: https://github.com/apache/arrow/pull/14547#issuecomment-1411088772 When will this fix be in PyPi? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-31 Thread via GitHub
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1092491250 ## cpp/src/arrow/array/validate.cc: ## @@ -622,6 +637,106 @@ struct ValidateArrayImpl { return Status::OK(); } + template + Status ValidateRunEndEncoded(

[GitHub] [arrow] avantgardnerio commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092487630 ## java/flight/flight-sql-jdbc-driver/src/test/java/org/apache/arrow/driver/jdbc/ArrowFlightJdbcDriverTest.java: ## @@ -133,6 +126,28 @@ public void testShouldCo

[GitHub] [arrow-adbc] lidavidm merged pull request #405: chore(go): bump to Arrow 12

2023-01-31 Thread via GitHub
lidavidm merged PR #405: URL: https://github.com/apache/arrow-adbc/pull/405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] avantgardnerio commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092484879 ## java/vector/src/test/java/org/apache/arrow/vector/table/RowTest.java: ## @@ -1,856 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] avantgardnerio commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092484081 ## java/vector/src/test/java/org/apache/arrow/vector/table/RowTest.java: ## @@ -1,856 +0,0 @@ -/* Review Comment: I'll put this back before I un-draft the PR

[GitHub] [arrow] lidavidm commented on pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
lidavidm commented on PR #33961: URL: https://github.com/apache/arrow/pull/33961#issuecomment-1411073818 In general you can look at the arrow-jdbc module for existing mapping between Arrow and JDBC types, as you've noted, not all types are easily handled -- This is an automated message fr

[GitHub] [arrow] avantgardnerio commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092483636 ## java/flight/flight-sql-jdbc-driver/src/test/java/org/apache/arrow/driver/jdbc/ArrowFlightJdbcDriverTest.java: ## @@ -133,6 +126,28 @@ public void testShouldCo

[GitHub] [arrow] avantgardnerio commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092482633 ## java/flight/flight-sql-jdbc-driver/src/test/java/org/apache/arrow/driver/jdbc/ArrowFlightJdbcDriverTest.java: ## @@ -133,6 +126,28 @@ public void testShouldCo

[GitHub] [arrow] lidavidm commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
lidavidm commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092482274 ## java/vector/src/test/java/org/apache/arrow/vector/table/RowTest.java: ## @@ -1,856 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or mor

[GitHub] [arrow] avantgardnerio commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092481994 ## java/flight/flight-sql-jdbc-driver/src/main/java/org/apache/arrow/driver/jdbc/utils/ConvertUtils.java: ## @@ -37,6 +45,114 @@ public final class ConvertUtils

[GitHub] [arrow-datafusion] timvw commented on issue #4850: Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally)

2023-01-31 Thread via GitHub
timvw commented on issue #4850: URL: https://github.com/apache/arrow-datafusion/issues/4850#issuecomment-1411071162 Instead of relying on file extension name (as per the current implementation) we could use some inspiration from duckdb for loading/importing data such that the user can indi

[GitHub] [arrow] avantgardnerio commented on a diff in pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio commented on code in PR #33961: URL: https://github.com/apache/arrow/pull/33961#discussion_r1092481407 ## java/flight/flight-sql-jdbc-driver/src/main/java/org/apache/arrow/driver/jdbc/utils/ConvertUtils.java: ## @@ -37,6 +45,114 @@ public final class ConvertUtils

[GitHub] [arrow] lidavidm commented on pull request #33961: [GH-33475]: [flight-sql] Send prepared statement parameters

2023-01-31 Thread via GitHub
lidavidm commented on PR #33961: URL: https://github.com/apache/arrow/pull/33961#issuecomment-1411070218 Oh cool, you worked this out. I had half a PR but never finished it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] avantgardnerio commented on issue #33475: [Java][FlightRPC] FlightSQL error: 'Parameter ordinal out of range' executing a prepared stmt with params

2023-01-31 Thread via GitHub
avantgardnerio commented on issue #33475: URL: https://github.com/apache/arrow/issues/33475#issuecomment-1411066434 PTAL at https://github.com/apache/arrow/pull/33961 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] avantgardnerio opened a new pull request, #33961: Send prepared statement parameters

2023-01-31 Thread via GitHub
avantgardnerio opened a new pull request, #33961: URL: https://github.com/apache/arrow/pull/33961 ### Rationale for this change Presently, the `flight-sql-jdbc-driver` does not send prepared statement parameter values to the server. This prevents Arrow FlightSql server implementors f

[GitHub] [arrow] lidavidm commented on pull request #14488: GH-33336: [C++][Parquet] Avoid UB on unaligned load

2023-01-31 Thread via GitHub
lidavidm commented on PR #14488: URL: https://github.com/apache/arrow/pull/14488#issuecomment-1411062407 I just kicked the jobs to be sure, but otherwise let's merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

  1   2   3   4   >