Re: [PR] Refactor `TreeNode.rewrite()` [arrow-datafusion]

2024-01-19 Thread via GitHub
peter-toth commented on PR #8891: URL: https://github.com/apache/arrow-datafusion/pull/8891#issuecomment-1901888976 > I filed #8913 -- let me know what you think. What do you think about creating a PR with `transform_with_payload` and then a PR showing how to use it to improve one of the e

Re: [I] Evaluates CASE branches even if their WHEN clause is false [arrow-datafusion]

2024-01-19 Thread via GitHub
haohuaijin commented on issue #8909: URL: https://github.com/apache/arrow-datafusion/issues/8909#issuecomment-1901859848 It looks like related to the `simplify_expressions` rule because the `simplify_expressions` rule does the [`ConstEvaluator`](https://github.com/apache/arrow-datafusion/b

Re: [I] `ParquetRecordBatchStreamBuilder::new()` panics instead of erroring out when opening a corrupted file [arrow-rs]

2024-01-19 Thread via GitHub
tustvold commented on issue #5315: URL: https://github.com/apache/arrow-rs/issues/5315#issuecomment-1901837787 Changing to an error seems straightforward enough, but I will point out the actual thrift decoding logic can also panic -- This is an automated message from the Apache Git Servic

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39699: URL: https://github.com/apache/arrow/pull/39699#issuecomment-1901763204 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 05b8f366e17ee6f21df4746bb6a65be399dfb68d. There was 1 b

Re: [PR] Improve `array_concat` signature for null and empty array [arrow-datafusion]

2024-01-19 Thread via GitHub
Weijun-H commented on code in PR #8594: URL: https://github.com/apache/arrow-datafusion/pull/8594#discussion_r1460223296 ## datafusion/expr/src/signature.rs: ## @@ -122,6 +122,9 @@ pub enum TypeSignature { /// List dimension of the List/LargeList is equivalent to the number

Re: [PR] Improve `array_concat` signature for null and empty array [arrow-datafusion]

2024-01-19 Thread via GitHub
Weijun-H commented on code in PR #8594: URL: https://github.com/apache/arrow-datafusion/pull/8594#discussion_r1460223296 ## datafusion/expr/src/signature.rs: ## @@ -122,6 +122,9 @@ pub enum TypeSignature { /// List dimension of the List/LargeList is equivalent to the number

Re: [PR] GH-38414 [Java][Vector] Add Delta dictionary support [arrow]

2024-01-19 Thread via GitHub
davisusanibar commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1460194781 ## java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowStream.java: ## @@ -64,7 +82,7 @@ public void testStreamZeroLengthBatch() throws IOException {

[I] `ParquetRecordBatchStreamBuilder::new()` panics instead of erroring out when opening a corrupted file [arrow-rs]

2024-01-19 Thread via GitHub
mmaitre314 opened a new issue, #5315: URL: https://github.com/apache/arrow-rs/issues/5315 When opening a corrupted Parquet file where one of the row groups is missing a column, `ParquetRecordBatchStreamBuilder::new()` panics instead of returning an error: ``` A panic occurred at

Re: [PR] arrow_json: support decimal 128 and 256 types in json writer [arrow-rs]

2024-01-19 Thread via GitHub
matthewgapp commented on code in PR #5197: URL: https://github.com/apache/arrow-rs/pull/5197#discussion_r1460183809 ## arrow-json/src/writer.rs: ## @@ -469,11 +469,67 @@ fn set_column_for_json_rows( row.insert(col_name.to_string(), serde_json::Value::Object(obj

Re: [PR] GH-39214: [Java] Support reproducible build [arrow]

2024-01-19 Thread via GitHub
assignUser commented on PR #39215: URL: https://github.com/apache/arrow/pull/39215#issuecomment-1901721889 @kou hm weird asf membership is public for David so the bot should pick it up... if this keeps happening we may need to add a non gh api based method to identify committers (e.g. commi

Re: [PR] GH-39704: [C++][Parquet] Benchmark levels decoding [arrow]

2024-01-19 Thread via GitHub
mapleFU commented on PR #39705: URL: https://github.com/apache/arrow/pull/39705#issuecomment-1901716994 > Reading a non nullable fixed size list is missing the fast path Yeah I think it's related, I think I can optimize unpack later, but maybe I need some help in optimizing RLE --

Re: [PR] GH-34865: [C++][Java][Flight RPC] Add Session management messages [arrow]

2024-01-19 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1460169296 ## format/Flight.proto: ## @@ -525,3 +525,108 @@ message FlightData { message PutResult { bytes app_metadata = 1; } + +/* + * EXPERIMENTAL: Union of possible va

Re: [PR] GH-39704: [C++][Parquet] Benchmark levels decoding [arrow]

2024-01-19 Thread via GitHub
alippai commented on PR #39705: URL: https://github.com/apache/arrow/pull/39705#issuecomment-1901716319 Is this use case relevant here? https://github.com/apache/arrow/issues/34510 Reading a non nullable fixed size list is missing the fast path, it’d nice to see it in the benchmark (e

Re: [I] [Parquet][R] Efficiently combine parquet files [arrow]

2024-01-19 Thread via GitHub
mapleFU commented on issue #39671: URL: https://github.com/apache/arrow/issues/39671#issuecomment-1901716162 Emmm Arrow-rs also having one. You can regard it as a command line tool: https://github.com/apache/arrow-rs/blob/master/parquet/src/bin/parquet-concat.rs Currently we didn't ha

Re: [PR] GH-34865: [C++][Java][Flight RPC] Add Session management messages [arrow]

2024-01-19 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1460167231 ## java/flight/flight-integration-tests/pom.xml: ## @@ -45,6 +45,10 @@ com.google.protobuf protobuf-java + Review Comment

Re: [PR] GH-34865: [C++][Java][Flight RPC] Add Session management messages [arrow]

2024-01-19 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1460165993 ## java/flight/flight-core/src/main/java/org/apache/arrow/flight/ServerSessionMiddleware.java: ## @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] GH-34865: [C++][Java][Flight RPC] Add Session management messages [arrow]

2024-01-19 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1460165228 ## java/flight/flight-core/src/main/java/org/apache/arrow/flight/ServerSessionMiddleware.java: ## @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] GH-34865: [C++][Java][Flight RPC] Add Session management messages [arrow]

2024-01-19 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1460164018 ## format/Flight.proto: ## @@ -525,3 +525,108 @@ message FlightData { message PutResult { bytes app_metadata = 1; } + +/* + * EXPERIMENTAL: Union of possible va

Re: [PR] GH-34865: [C++][Java][Flight RPC] Add Session management messages [arrow]

2024-01-19 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1460162534 ## format/Flight.proto: ## @@ -525,3 +525,108 @@ message FlightData { message PutResult { bytes app_metadata = 1; } + +/* + * EXPERIMENTAL: Union of possible va

Re: [I] Support Copy with Remote Object Stores in datafusion-cli [arrow-datafusion]

2024-01-19 Thread via GitHub
Lordworms commented on issue #8907: URL: https://github.com/apache/arrow-datafusion/issues/8907#issuecomment-1901680646 May I take this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Pass options to HTTPBuilder in parse_url_opts (#5310) [arrow-rs]

2024-01-19 Thread via GitHub
CarlKCarlK commented on PR #5311: URL: https://github.com/apache/arrow-rs/pull/5311#issuecomment-1901623892 [Also added to the issue] You need to add line 167 back in or this fix doesn't work. Here is the missing line: let url = &url[..url::Position::BeforePath]; --

Re: [PR] GH-39001: [Java] Modularize remaining modules [arrow]

2024-01-19 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39221: URL: https://github.com/apache/arrow/pull/39221#issuecomment-1901598735 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 92682f0f6064224acd6dd746ac45d6df4b1963c4. There were no

Re: [PR] feat: support `stride` in `array_slice`, change indexes to be`0` based [arrow-datafusion]

2024-01-19 Thread via GitHub
Weijun-H commented on PR #8829: URL: https://github.com/apache/arrow-datafusion/pull/8829#issuecomment-1901582431 > Looks good to me @Weijun-H -- thank you > > Can you confirm the behavior change to be `0` based indexes rather than `1` based indexes is intentional? I updated the PR t

Re: [PR] GH-39712: [Java] Enable code review and formatting code through Spotless Maven plugin [arrow]

2024-01-19 Thread via GitHub
davisusanibar commented on PR #39713: URL: https://github.com/apache/arrow/pull/39713#issuecomment-1901556775 Please consider: Initial formatting implementations will require a large number of file changes. Among the main changes that need to be reviewed are: - pom.xml - maven/p

Re: [I] bug: array_length only return 1 element [arrow-datafusion]

2024-01-19 Thread via GitHub
BubbaJoe closed issue #6693: bug: array_length only return 1 element URL: https://github.com/apache/arrow-datafusion/issues/6693 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] GH-39712: [Java] Enable code review and formatting code through Spotless Maven plugin [arrow]

2024-01-19 Thread via GitHub
github-actions[bot] commented on PR #39713: URL: https://github.com/apache/arrow/pull/39713#issuecomment-1901552420 :warning: GitHub issue #39712 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-38936: [JS] Specify side effects [arrow]

2024-01-19 Thread via GitHub
domoritz opened a new pull request, #39714: URL: https://github.com/apache/arrow/pull/39714 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[PR] GH-39712: [Java] Enable code review and formatting code through Spotless Maven plugin [arrow]

2024-01-19 Thread via GitHub
davisusanibar opened a new pull request, #39713: URL: https://github.com/apache/arrow/pull/39713 ### Rationale for this change To enable code review and formatting code by using [Spotless Maven plugin](https://github.com/diffplug/spotless). ### What changes are included in this

Re: [PR] GH-39676: [C++][Parquet] Fast Random Rowgroup Reads [arrow]

2024-01-19 Thread via GitHub
corwinjoy commented on code in PR #39677: URL: https://github.com/apache/arrow/pull/39677#discussion_r1460107417 ## cpp/src/parquet/column_writer.cc: ## @@ -323,6 +323,19 @@ class SerializedPageWriter : public PageWriter { PARQUET_THROW_NOT_OK(sink_->Write(output_data_buf

Re: [PR] GH-39676: [C++][Parquet] Fast Random Rowgroup Reads [arrow]

2024-01-19 Thread via GitHub
corwinjoy commented on code in PR #39677: URL: https://github.com/apache/arrow/pull/39677#discussion_r1460107417 ## cpp/src/parquet/column_writer.cc: ## @@ -323,6 +323,19 @@ class SerializedPageWriter : public PageWriter { PARQUET_THROW_NOT_OK(sink_->Write(output_data_buf

Re: [PR] GH-39676: [C++][Parquet] Fast Random Rowgroup Reads [arrow]

2024-01-19 Thread via GitHub
corwinjoy commented on PR #39677: URL: https://github.com/apache/arrow/pull/39677#issuecomment-1901471188 Fair enough, this is only an exploratory PR to show what this approach could look like and for performance profiling. -- This is an automated message from the Apache Git Service. To r

Re: [I] [C++][Parquet] Fast Random Rowgroup Reads [arrow]

2024-01-19 Thread via GitHub
corwinjoy commented on issue #39676: URL: https://github.com/apache/arrow/issues/39676#issuecomment-1901466983 @mapleFU wrote: > I understand why don't read all row-group metadata, but why a "first RowGroup" is read in this experiment? Since we already has schema here: https://github.com

Re: [I] [C++][Parquet] Fast Random Rowgroup Reads [arrow]

2024-01-19 Thread via GitHub
corwinjoy commented on issue #39676: URL: https://github.com/apache/arrow/issues/39676#issuecomment-1901432629 @emkornfield wrote: > @corwinjoy I think we should likely address a issues here before proceeding to an implementation: > > 1. Do you have a flame-graph or other granular

Re: [I] [C++][Parquet] Fast Random Rowgroup Reads [arrow]

2024-01-19 Thread via GitHub
corwinjoy commented on issue #39676: URL: https://github.com/apache/arrow/issues/39676#issuecomment-1901421261 Points from the profiling session: 1. This supports my claim that the metadata read is extremely expensive (up to 40x the read time with statistics). 2. Removing statistics he

[PR] Simplify windows builtin functions return type [arrow-datafusion]

2024-01-19 Thread via GitHub
comphead opened a new pull request, #8920: URL: https://github.com/apache/arrow-datafusion/pull/8920 ## Which issue does this PR close? Closes #. ## Rationale for this change Before the PR the Datafusion derives the output datatypes for builtin functions twice.

Re: [I] [C++][Parquet] Fast Random Rowgroup Reads [arrow]

2024-01-19 Thread via GitHub
corwinjoy commented on issue #39676: URL: https://github.com/apache/arrow/issues/39676#issuecomment-190147 @emkornfield @mapleFU These are a lot of questions. I have started by running a more detailed performance profile using the PR posted previously + a larger data set for clarity.

Re: [I] [Java] Implement/use ServiceProvider for discovering drivers [arrow-adbc]

2024-01-19 Thread via GitHub
lidavidm commented on issue #48: URL: https://github.com/apache/arrow-adbc/issues/48#issuecomment-1901401724 Yeah, the full package would make more sense. I'm not too concerned about breakage here so long as the core interfaces remain stable. -- This is an automated message from the Apach

Re: [PR] support to_timestamp with optional chrono formats [arrow-datafusion]

2024-01-19 Thread via GitHub
Omega359 commented on PR #8886: URL: https://github.com/apache/arrow-datafusion/pull/8886#issuecomment-1901358571 > ... if we decide to go with current approach we need to provide more documentation and examples for users in `scalar_functions.md`. Ideally even to have a separate doc page w

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
assignUser merged PR #39699: URL: https://github.com/apache/arrow/pull/39699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

Re: [PR] GH-34865: [C++][Java][Flight RPC] Add Session management messages [arrow]

2024-01-19 Thread via GitHub
lidavidm commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1459974165 ## cpp/src/arrow/flight/sql/server_session_middleware_factory.h: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] test: Port tests in `partitioned_csv.rs` to sqllogictest [arrow-datafusion]

2024-01-19 Thread via GitHub
simicd commented on code in PR #8919: URL: https://github.com/apache/arrow-datafusion/pull/8919#discussion_r1459971438 ## datafusion/core/tests/sql/partitioned_csv.rs: ## @@ -1,77 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributor lice

Re: [PR] GH-34865: [C++][Java][Flight RPC] Add Session management messages [arrow]

2024-01-19 Thread via GitHub
indigophox commented on PR #34817: URL: https://github.com/apache/arrow/pull/34817#issuecomment-1901276943 @lidavidm bit of a side quest, what do you make of these (basically identical issue across both): https://github.com/indigophox/arrow/actions/runs/7549784164/job/20554364643#ste

Re: [PR] test: Port tests in `partitioned_csv.rs` to sqllogictest [arrow-datafusion]

2024-01-19 Thread via GitHub
simicd commented on code in PR #8919: URL: https://github.com/apache/arrow-datafusion/pull/8919#discussion_r1459968742 ## datafusion/sqllogictest/test_files/csv_files.slt: ## @@ -63,3 +63,76 @@ id6 value"6 id7 value"7 id8 value"8 id9 value"9 + + +# Read partitioned csv Revie

Re: [PR] test: Port tests in `partitioned_csv.rs` to sqllogictest [arrow-datafusion]

2024-01-19 Thread via GitHub
simicd commented on code in PR #8919: URL: https://github.com/apache/arrow-datafusion/pull/8919#discussion_r1459966938 ## datafusion/core/tests/sql/select.rs: ## @@ -482,7 +482,7 @@ async fn sort_on_window_null_string() -> Result<()> { async fn test_prepare_statement() -> Resul

[PR] test: Port tests in `partitioned_csv.rs` to sqllogictest [arrow-datafusion]

2024-01-19 Thread via GitHub
simicd opened a new pull request, #8919: URL: https://github.com/apache/arrow-datafusion/pull/8919 ## Which issue does this PR close? Closes #8208 ## Rationale for this change Migrate tests from Rust to sqllogictest ## What changes are included in this PR? - Remove dupl

Re: [I] [GLib] Allow to create time and timestamp arrays with a time zone [arrow]

2024-01-19 Thread via GitHub
kou commented on issue #39702: URL: https://github.com/apache/arrow/issues/39702#issuecomment-1901271076 Timezone support for `GArrowTimestampDataType` makes sense but it's for `GArrowTime{32,64}DataType` doesn't make sense. Because `GArrowTime{32,64}DataType` is seconds/milliseoncs/micros

Re: [I] Pass Options to HttpBuilder in parse_url_opts [arrow-rs]

2024-01-19 Thread via GitHub
CarlKCarlK commented on issue #5310: URL: https://github.com/apache/arrow-rs/issues/5310#issuecomment-1901267544 [Also added to the PR] You need to add line 167 back in of this fix doesn't work: let url = &url[..url::Position::BeforePath]; -- This is an automated

Re: [PR] aggregate_statistics should only optimize MIN/MAX when relation is not empty [arrow-datafusion]

2024-01-19 Thread via GitHub
viirya commented on code in PR #8914: URL: https://github.com/apache/arrow-datafusion/pull/8914#discussion_r1459947266 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -3260,3 +3260,19 @@ query I select count(*) from (select count(*) a, count(*) b from (select 1)); -

Re: [PR] GH-39001: [Java] Modularize remaining modules [arrow]

2024-01-19 Thread via GitHub
lidavidm merged PR #39221: URL: https://github.com/apache/arrow/pull/39221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

Re: [I] Epic: Unified TreeNode rewrite API [arrow-datafusion]

2024-01-19 Thread via GitHub
ozankabak commented on issue #8913: URL: https://github.com/apache/arrow-datafusion/issues/8913#issuecomment-1901253943 The main concern I have is composability. We have many use case where the flow goes something like this: 1. Visit/transform a given tree, and save per-node data dur

Re: [PR] Add syntax highlight to datafusion-cli [arrow-datafusion]

2024-01-19 Thread via GitHub
trungda commented on PR #8918: URL: https://github.com/apache/arrow-datafusion/pull/8918#issuecomment-1901243443 The current implementation is not the best. If the input string is invalid, e.g., unterminated: `select * from tab_1 where col_1 = '`, the whole string will be _un_highlighted.

Re: [I] No efficient way to load a subset of files from partitioned table [arrow-datafusion]

2024-01-19 Thread via GitHub
rspears74 commented on issue #8906: URL: https://github.com/apache/arrow-datafusion/issues/8906#issuecomment-1901238702 Another important thing is I don't actually need to use delta-rs. If there were some way to use a generic `ListingTable`, that would work as well. `read_parquet` works gr

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
nealrichardson commented on code in PR #39699: URL: https://github.com/apache/arrow/pull/39699#discussion_r1459914931 ## r/tools/nixlibs.R: ## @@ -912,8 +912,6 @@ if (is_release) { VERSION <- VERSION[1, 1:3] arrow_repo <- paste0(getOption("arrow.repo", sprintf("https://ap

[PR] use tokenizer [arrow-datafusion]

2024-01-19 Thread via GitHub
trungda opened a new pull request, #8918: URL: https://github.com/apache/arrow-datafusion/pull/8918 ## Which issue does this PR close? https://github.com/apache/arrow-datafusion/issues/8701 ## Rationale for this change More user-friendly and readability when using `datafusion

Re: [PR] GH-39214: [Java] Support reproducible build [arrow]

2024-01-19 Thread via GitHub
kou commented on PR #39215: URL: https://github.com/apache/arrow/pull/39215#issuecomment-1901191614 @assignUser Could you take a look at https://github.com/apache/arrow/pull/39215#issuecomment-1900481478 and https://github.com/apache/arrow/pull/39215#issuecomment-1900482331 ? #39610

Re: [PR] Fix expr partial ord test [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on code in PR #8908: URL: https://github.com/apache/arrow-datafusion/pull/8908#discussion_r1459871372 ## datafusion/expr/src/expr.rs: ## @@ -1869,10 +1869,14 @@ mod test { let exp2 = col("a") + lit(2); let exp3 = !(col("a") + lit(2)); -

Re: [PR] aggregate_statistics should only optimize MIN/MAX when relation is not empty [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on code in PR #8914: URL: https://github.com/apache/arrow-datafusion/pull/8914#discussion_r1459863692 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -3260,3 +3260,19 @@ query I select count(*) from (select count(*) a, count(*) b from (select 1)); --

Re: [PR] [WIP] TreeNode refactor code deduplication: Part 3 [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on PR #8817: URL: https://github.com/apache/arrow-datafusion/pull/8817#issuecomment-1901185514 Related discussion / epic: https://github.com/apache/arrow-datafusion/issues/8913 -- This is an automated message from the Apache Git Service. To respond to the message, please

[I] Epic: Implement remaining t `information_schema` tables [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb opened a new issue, #8916: URL: https://github.com/apache/arrow-datafusion/issues/8916 ### Is your feature request related to a problem or challenge? @JanKaul reported in [Discord](https://discord.com/channels/885562378132000778/1166447479609376850/1197812043969986620):

Re: [I] Information Schema Shows Tables from Other Catalogs [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on issue #4408: URL: https://github.com/apache/arrow-datafusion/issues/4408#issuecomment-1901151417 I think this was fixed a long time ago -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] GH-39690: [C++][FlightRPC] Fix nullptr dereference in PollInfo [arrow]

2024-01-19 Thread via GitHub
lidavidm opened a new pull request, #39711: URL: https://github.com/apache/arrow/pull/39711 ### Rationale for this change The current implementation is a bit painful to use due to the lack of a move constructor. ### What changes are included in this PR? - Fix a c

Re: [PR] GH-39001: [Java] Modularize remaining modules [arrow]

2024-01-19 Thread via GitHub
jduo commented on PR #39221: URL: https://github.com/apache/arrow/pull/39221#issuecomment-1901149137 @kou , I've switched to using CMAKE_SYSTEM_PROCESSOR and changed CI scripts and POM files to let auto-detection figure ARROW_JAVA_JNI_ARCH_DIR. I think this addresses everything. -- This

Re: [PR] support to_timestamp with optional chrono formats [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on PR #8886: URL: https://github.com/apache/arrow-datafusion/pull/8886#issuecomment-1901137831 Thank you again @Omega359 @comphead and @gruuya > I'll leave the final decision to @alamb although he already approved. My point it would be unexpected to users to have `

Re: [I] Minor: small refactoring for ints/floats to timestamps conversions [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on issue #8370: URL: https://github.com/apache/arrow-datafusion/issues/8370#issuecomment-1901138684 I think this may be fixed now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Support "standard" / alternate format arguments for `to_timestamp` [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on issue #8915: URL: https://github.com/apache/arrow-datafusion/issues/8915#issuecomment-1901134996 Also, perhaps the https://github.com/apache/arrow-datafusion-comet project will be able to contribute a spark compatible implementation of timestamp parsing (that is probably

[I] Support "standard" format arguments for `to_timestamp` [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb opened a new issue, #8915: URL: https://github.com/apache/arrow-datafusion/issues/8915 ### Is your feature request related to a problem or challenge? After https://github.com/apache/arrow-datafusion/pull/8886 (thanks to @Omega359) DataFusion supports converting strings to timest

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
assignUser commented on code in PR #39699: URL: https://github.com/apache/arrow/pull/39699#discussion_r1459787811 ## r/tools/nixlibs.R: ## @@ -912,8 +912,6 @@ if (is_release) { VERSION <- VERSION[1, 1:3] arrow_repo <- paste0(getOption("arrow.repo", sprintf("https://apache

Re: [I] Support Copy with Remote Object Stores in datafusion-cli [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on issue #8907: URL: https://github.com/apache/arrow-datafusion/issues/8907#issuecomment-1901120979 I think the need is documented, so this is just a matter of debugging and connecting up the existing APIs. I think it is a good first issue -- This is an automated message

Re: [I] Evaluates CASE branches even if their WHEN clause is false [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on issue #8909: URL: https://github.com/apache/arrow-datafusion/issues/8909#issuecomment-1901120167 Possible related https://github.com/apache/arrow-datafusion/pull/8833 / https://github.com/apache/arrow-datafusion/issues/8814 cc @haohuaijin and @liukun4515 -- T

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
assignUser commented on code in PR #39699: URL: https://github.com/apache/arrow/pull/39699#discussion_r1459774545 ## r/tools/nixlibs.R: ## @@ -906,28 +906,13 @@ on_windows <- tolower(Sys.info()[["sysname"]]) == "windows" # For local debugging, set ARROW_R_DEV=TRUE to make this

Re: [PR] support to_timestamp with optional chrono formats [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on PR #8886: URL: https://github.com/apache/arrow-datafusion/pull/8886#issuecomment-1901117238 I took the liberty of merging up from main and running prettier to get CI clean. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] aggregate_statistics should only optimize MIN/MAX when relation is not empty [arrow-datafusion]

2024-01-19 Thread via GitHub
viirya commented on code in PR #8914: URL: https://github.com/apache/arrow-datafusion/pull/8914#discussion_r1459754233 ## datafusion/core/src/physical_optimizer/aggregate_statistics.rs: ## @@ -197,17 +197,28 @@ fn take_optimizable_min( agg_expr: &dyn AggregateExpr, sta

Re: [PR] Add hash_join_single_partition_threshold_rows config [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on PR #8720: URL: https://github.com/apache/arrow-datafusion/pull/8720#issuecomment-1901112256 Here is the result of the benchmarks when I ran them. My conclusion is that there is no measurable difference in performance due to this branch so it should be good to go `

Re: [PR] aggregate_statistics should only optimize MIN/MAX when relation is not empty [arrow-datafusion]

2024-01-19 Thread via GitHub
viirya commented on code in PR #8914: URL: https://github.com/apache/arrow-datafusion/pull/8914#discussion_r1459754233 ## datafusion/core/src/physical_optimizer/aggregate_statistics.rs: ## @@ -197,17 +197,28 @@ fn take_optimizable_min( agg_expr: &dyn AggregateExpr, sta

[PR] aggregate_statistics should only optimize MIN/MAX when relation is not empty [arrow-datafusion]

2024-01-19 Thread via GitHub
viirya opened a new pull request, #8914: URL: https://github.com/apache/arrow-datafusion/pull/8914 ## Which issue does this PR close? Closes #8911. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [PR] GH-38414 [Java][Vector] Add Delta dictionary support. [arrow]

2024-01-19 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1459748118 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/Dictionary.java: ## @@ -72,4 +74,14 @@ public boolean equals(Object o) { public int hashCode() {

Re: [PR] Add hash_join_single_partition_threshold_rows config [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on PR #8720: URL: https://github.com/apache/arrow-datafusion/pull/8720#issuecomment-1901092833 I merged this branch with `main` and am running tests on it now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Add hash_join_single_partition_threshold_rows config [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on PR #8720: URL: https://github.com/apache/arrow-datafusion/pull/8720#issuecomment-1901092075 > To me the change looks good, I think it makes sense to run the (TPC-H) benchmarks once more to catch regressions. I will do so -- This is an automated message from the

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
assignUser commented on code in PR #39699: URL: https://github.com/apache/arrow/pull/39699#discussion_r1459740963 ## r/tools/nixlibs.R: ## @@ -906,28 +906,13 @@ on_windows <- tolower(Sys.info()[["sysname"]]) == "windows" # For local debugging, set ARROW_R_DEV=TRUE to make this

Re: [PR] Suppress unused variable warnings [arrow-nanoarrow]

2024-01-19 Thread via GitHub
paleolimbot commented on code in PR #365: URL: https://github.com/apache/arrow-nanoarrow/pull/365#discussion_r1459736152 ## src/nanoarrow/array_stream.c: ## @@ -19,6 +19,8 @@ #include "nanoarrow.h" +#define NANOARROW_UNUSED(x) (void)(x) Review Comment: Would you mind mo

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
nealrichardson commented on code in PR #39699: URL: https://github.com/apache/arrow/pull/39699#discussion_r1459737589 ## r/tools/nixlibs.R: ## @@ -906,28 +906,13 @@ on_windows <- tolower(Sys.info()[["sysname"]]) == "windows" # For local debugging, set ARROW_R_DEV=TRUE to make

Re: [PR] Refactor `TreeNode.rewrite()` [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on PR #8891: URL: https://github.com/apache/arrow-datafusion/pull/8891#issuecomment-1901083234 I filed https://github.com/apache/arrow-datafusion/issues/8913 -- let me know what you think. What do you think about creating a PR with `transform_with_payload` and then a PR sho

Re: [PR] GH-39001: [Java] Modularize remaining modules [arrow]

2024-01-19 Thread via GitHub
github-actions[bot] commented on PR #39221: URL: https://github.com/apache/arrow/pull/39221#issuecomment-1901080557 Revision: 0ece855474a810301d2e1b7c09451cd155a3fb26 Submitted crossbow builds: [ursacomputing/crossbow @ actions-0785af577e](https://github.com/ursacomputing/crossbow/bra

[I] Epic: Unified TreeNode rewrite API [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb opened a new issue, #8913: URL: https://github.com/apache/arrow-datafusion/issues/8913 ### Is your feature request related to a problem or challenge? (note this is mostly from @peter-toth 's description on https://github.com/apache/arrow-datafusion/pull/8891#issuecomment-1900138

Re: [PR] GH-39001: [Java] Modularize remaining modules [arrow]

2024-01-19 Thread via GitHub
jduo commented on PR #39221: URL: https://github.com/apache/arrow/pull/39221#issuecomment-1901078353 @github-actions crossbow submit *java-jars* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] [Parquet][R] Efficiently combine parquet files [arrow]

2024-01-19 Thread via GitHub
r2evans commented on issue #39671: URL: https://github.com/apache/arrow/issues/39671#issuecomment-1901069004 That's an interesting utility, thank you for the pointer to it. I had been thinking of capability within a particular language, perhaps something baked into `arrow.so` or simil

Re: [PR] feat(python): Implement user-facing Schema class [arrow-nanoarrow]

2024-01-19 Thread via GitHub
paleolimbot commented on code in PR #366: URL: https://github.com/apache/arrow-nanoarrow/pull/366#discussion_r1459714963 ## python/src/nanoarrow/schema.py: ## @@ -0,0 +1,448 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreement

Re: [PR] feat(go/adbc/driver/flightsql): enable incremental queries [arrow-adbc]

2024-01-19 Thread via GitHub
lidavidm commented on code in PR #1457: URL: https://github.com/apache/arrow-adbc/pull/1457#discussion_r1459714579 ## go/adbc/driver/flightsql/flightsql_statement.go: ## @@ -132,6 +153,9 @@ type statement struct { prepared *flightsql.PreparedStatement queueSize

Re: [I] MSSQL support [arrow-adbc]

2024-01-19 Thread via GitHub
davlee1972 commented on issue #588: URL: https://github.com/apache/arrow-adbc/issues/588#issuecomment-1901067912 Not to hijack this, but could we also work on Sybase support at the same time? https://github.com/SAP/go-ase Both products support FreeTDS. The initial version of

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
assignUser commented on code in PR #39699: URL: https://github.com/apache/arrow/pull/39699#discussion_r1459713149 ## r/tools/nixlibs.R: ## @@ -906,28 +906,13 @@ on_windows <- tolower(Sys.info()[["sysname"]]) == "windows" # For local debugging, set ARROW_R_DEV=TRUE to make this

Re: [PR] feat(go/adbc/driver/flightsql): enable incremental queries [arrow-adbc]

2024-01-19 Thread via GitHub
zeroshade commented on PR #1457: URL: https://github.com/apache/arrow-adbc/pull/1457#issuecomment-1901065946 Looks good in general other than a few comments / nitpicks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] feat(python): Implement user-facing Schema class [arrow-nanoarrow]

2024-01-19 Thread via GitHub
paleolimbot commented on code in PR #366: URL: https://github.com/apache/arrow-nanoarrow/pull/366#discussion_r1459711879 ## python/src/nanoarrow/schema.py: ## @@ -0,0 +1,448 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreement

Re: [PR] feat(python): Implement user-facing Schema class [arrow-nanoarrow]

2024-01-19 Thread via GitHub
paleolimbot commented on code in PR #366: URL: https://github.com/apache/arrow-nanoarrow/pull/366#discussion_r1459710063 ## python/src/nanoarrow/schema.py: ## @@ -0,0 +1,448 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreement

Re: [I] MIN and MAX return data from WHERE clause on empty input [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on issue #8911: URL: https://github.com/apache/arrow-datafusion/issues/8911#issuecomment-1901062428 I agree this is a bug. Thanks for filing @tv42 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
github-actions[bot] commented on PR #39699: URL: https://github.com/apache/arrow/pull/39699#issuecomment-1901063563 Revision: 01eb6ee10fc65c8ebdd635d12921e22392bf24d3 Submitted crossbow builds: [ursacomputing/crossbow @ actions-7e0d11b7c4](https://github.com/ursacomputing/crossbow/bra

Re: [PR] feat(python): Implement user-facing Schema class [arrow-nanoarrow]

2024-01-19 Thread via GitHub
paleolimbot commented on code in PR #366: URL: https://github.com/apache/arrow-nanoarrow/pull/366#discussion_r1459707429 ## python/src/nanoarrow/schema.py: ## @@ -0,0 +1,448 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreement

Re: [PR] feat(python): Implement user-facing Schema class [arrow-nanoarrow]

2024-01-19 Thread via GitHub
paleolimbot commented on code in PR #366: URL: https://github.com/apache/arrow-nanoarrow/pull/366#discussion_r1459705910 ## python/tests/test_schema.py: ## @@ -0,0 +1,146 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] Refactor `TreeNode.rewrite()` [arrow-datafusion]

2024-01-19 Thread via GitHub
alamb commented on PR #8891: URL: https://github.com/apache/arrow-datafusion/pull/8891#issuecomment-1901060904 I see -- thank you @peter-toth. Your explanation makes sense. As a historical note, the current TreeNode API came out of unifying different in inconsistent APIs across the E

Re: [PR] feat(python): Implement user-facing Schema class [arrow-nanoarrow]

2024-01-19 Thread via GitHub
paleolimbot commented on code in PR #366: URL: https://github.com/apache/arrow-nanoarrow/pull/366#discussion_r1459703479 ## python/src/nanoarrow/schema.py: ## @@ -0,0 +1,448 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreement

Re: [PR] GH-39697: [R] Source build should check if offline [arrow]

2024-01-19 Thread via GitHub
assignUser commented on PR #39699: URL: https://github.com/apache/arrow/pull/39699#issuecomment-1901059588 @github-actions crossbow submit r-binary-packages -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

  1   2   3   >