Re: [PR] Implement `DISTINCT ON` from Postgres [arrow-datafusion]

2023-11-04 Thread via GitHub
gruuya commented on code in PR #7981: URL: https://github.com/apache/arrow-datafusion/pull/7981#discussion_r1382416566 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -60,6 +80,62 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { )?);

[PR] Fix ArrayAgg schema mismatch issue [arrow-datafusion]

2023-11-04 Thread via GitHub
jayzhan211 opened a new pull request, #8055: URL: https://github.com/apache/arrow-datafusion/pull/8055 ## Which issue does this PR close? Closes #8032. ## Rationale for this change The main issue for #8032 is as what the errors said we have different schem

[I] Regression when serializing large json numbers [arrow-rs]

2023-11-04 Thread via GitHub
Blajda opened a new issue, #5038: URL: https://github.com/apache/arrow-rs/issues/5038 **Describe the bug** Serializing and writing a json Number `Number(1699148028689)` to parquet using `arrow_json` the value is not preserved. When the value is read back from the parquet file, we obtai

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-11-04 Thread via GitHub
niyue commented on code in PR #38116: URL: https://github.com/apache/arrow/pull/38116#discussion_r1382495703 ## cpp/src/gandiva/function_registry.cc: ## @@ -41,42 +62,74 @@ FunctionRegistry::iterator FunctionRegistry::back() const { return &(pc_registry_.back()); } -std::v

Re: [I] [Python] Add `exclude_invalid_files` to `ParquetDataset` [arrow]

2023-11-04 Thread via GitHub
bveeramani commented on issue #36278: URL: https://github.com/apache/arrow/issues/36278#issuecomment-1793603679 Ah, we were able to find a workaround, so we aren't working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] GH-38570: [R] Ensure that test-nix-libs is warning free [arrow]

2023-11-04 Thread via GitHub
paleolimbot commented on PR #38571: URL: https://github.com/apache/arrow/pull/38571#issuecomment-1793599526 I'll take a closer look Monday! I did have https://github.com/apache/arrow/pull/38534 open to solves one of the two warnings listed but I'm also happy to rebase after this one merges

Re: [I] [C++][Gandiva] Enhance random data generation [arrow]

2023-11-04 Thread via GitHub
kou commented on issue #38569: URL: https://github.com/apache/arrow/issues/38569#issuecomment-1793578384 Why is the range important in them? Is the `upper` function used in them related? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-11-04 Thread via GitHub
kou commented on code in PR #38116: URL: https://github.com/apache/arrow/pull/38116#discussion_r1382476444 ## cpp/src/gandiva/function_registry.cc: ## @@ -41,42 +62,74 @@ FunctionRegistry::iterator FunctionRegistry::back() const { return &(pc_registry_.back()); } -std::vec

[I] Binary columns do not receive truncated statistics [arrow-rs]

2023-11-04 Thread via GitHub
emcake opened a new issue, #5037: URL: https://github.com/apache/arrow-rs/issues/5037 **Describe the bug** #4389 introduced truncation on column indices for binary columns, where the min/max values for a binary column may be arbitrarily large. As noted, this matches the behaviour in parq

Re: [I] [C++] Missing FindArrow.cmake in the recent versions [arrow]

2023-11-04 Thread via GitHub
kou commented on issue #38583: URL: https://github.com/apache/arrow/issues/38583#issuecomment-1793570031 We provide `ArrowConfig.cmake` instead of `FindArrow.cmake`. `find_package(Arrow)` uses `ArrowConfig.cmake` not `FindArrow.cmake` without configuration. See also: https://cmake.org/cm

Re: [PR] [GH-38381][C++][Acero] Create a sorted merge node [arrow]

2023-11-04 Thread via GitHub
JerAguilon commented on PR #38380: URL: https://github.com/apache/arrow/pull/38380#issuecomment-1793569675 NVM, checking out arrow/main for `testing` worked. Ready for another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] GH-38562: [Packaging] Add support for Ubuntu 23.10 [arrow]

2023-11-04 Thread via GitHub
kou commented on PR #38563: URL: https://github.com/apache/arrow/pull/38563#issuecomment-1793569193 > Have we always supported non LTS versions? Yes. > If so I am -.9 on that Why? To reduce CI time? Or To reduce maintenance cost? Regarding CI time: It will not be a

Re: [PR] GH-28994: [C++][JSON] Add support for customizing the max rows [arrow]

2023-11-04 Thread via GitHub
github-actions[bot] commented on PR #38582: URL: https://github.com/apache/arrow/pull/38582#issuecomment-1793568894 :warning: GitHub issue #28994 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] Write DataFusion paper for (SIGMOD / VLDB / ICDE) [arrow-datafusion]

2023-11-04 Thread via GitHub
ozankabak commented on issue #6782: URL: https://github.com/apache/arrow-datafusion/issues/6782#issuecomment-1793558110 I went through Section 4 today. I think it has improved quite a bit, PTAL. I plan to go through Section 5 tomorrow. -- This is an automated message from the Apache Git

Re: [PR] GH-38330: [C++][Azure] Use properties for input stream metadata [arrow]

2023-11-04 Thread via GitHub
kou commented on PR #38524: URL: https://github.com/apache/arrow/pull/38524#issuecomment-1793551202 I'll merge this in a few days if nobody objects this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver [arrow]

2023-11-04 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38404: URL: https://github.com/apache/arrow/pull/38404#issuecomment-1793546530 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit fc8c6b7dc8287c672b62c62f3a2bd724b3835063. There was 1 b

[PR] Minor: clean up array_replace [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb opened a new pull request, #8054: URL: https://github.com/apache/arrow-datafusion/pull/8054 ## Which issue does this PR close? Follow on to https://github.com/apache/arrow-datafusion/pull/8050 from @jayzhan211 -- ## Rationale for this change The change in htt

Re: [PR] General approach for Array replace [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb merged PR #8050: URL: https://github.com/apache/arrow-datafusion/pull/8050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] [Python][Docs] PyArrow Documentation remove the unnecessary dot to avoid confusion [arrow]

2023-11-04 Thread via GitHub
github-actions[bot] commented on PR #38588: URL: https://github.com/apache/arrow/pull/38588#issuecomment-1793532656 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] [Python][Docs] PyArrow Documentation remove the unnecessary dot to avoid confusion [arrow]

2023-11-04 Thread via GitHub
ChinYikMing opened a new pull request, #38588: URL: https://github.com/apache/arrow/pull/38588 The signature of function to finalize output stream is `getvalue` instead of `get.value`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] MINOR: [C++] Fix typo in Decimal256::FromBigEndian error [arrow]

2023-11-04 Thread via GitHub
WillAyd opened a new pull request, #38587: URL: https://github.com/apache/arrow/pull/38587 Looks like a copy/paste error from the Decimal128 variant -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Support Continue on Error in CSV Reader [arrow-rs]

2023-11-04 Thread via GitHub
GCHQDeveloper61637 commented on issue #4809: URL: https://github.com/apache/arrow-rs/issues/4809#issuecomment-1793522396 Well, it appeared silent to _me_. :) I first encountered the OOM error after upgrading from an earlier version of the Arrow crate where the CSV reader _would_ continue a

Re: [PR] Minor: improve documentation for IsNotNull, DISTINCT, etc [arrow-datafusion]

2023-11-04 Thread via GitHub
comphead commented on code in PR #8052: URL: https://github.com/apache/arrow-datafusion/pull/8052#discussion_r1382436731 ## datafusion/expr/src/expr.rs: ## @@ -99,21 +99,21 @@ pub enum Expr { SimilarTo(Like), /// Negation of an expression. The expression's type must be

Re: [PR] Minor: improve documentation for IsNotNull, DISTINCT, etc [arrow-datafusion]

2023-11-04 Thread via GitHub
comphead commented on code in PR #8052: URL: https://github.com/apache/arrow-datafusion/pull/8052#discussion_r1382436291 ## datafusion/expr/src/expr.rs: ## @@ -99,21 +99,21 @@ pub enum Expr { SimilarTo(Like), /// Negation of an expression. The expression's type must be

Re: [PR] GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver [arrow]

2023-11-04 Thread via GitHub
aiguofer commented on PR #38404: URL: https://github.com/apache/arrow/pull/38404#issuecomment-1793503818 > I filed a general followups issue at #38585 > > Thanks again @aiguofer! Perfect sounds good! I can work on a few of those soon after this! -- This is an automated mess

Re: [PR] GH-38460: [Java][FlightRPC] Add mTLS support for Flight SQL JDBC driver [arrow]

2023-11-04 Thread via GitHub
lidavidm commented on code in PR #38461: URL: https://github.com/apache/arrow/pull/38461#discussion_r1382433243 ## java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightServer.java: ## @@ -328,6 +360,15 @@ public Builder useTls(final InputStream certChain, final I

Re: [I] go/adbc/driver/flightsql: When CookiesMiddleware is enabled, DO_GET requests have a different set of cookies [arrow-adbc]

2023-11-04 Thread via GitHub
lidavidm commented on issue #1194: URL: https://github.com/apache/arrow-adbc/issues/1194#issuecomment-1793498042 Yes, IMO the ideal behavior is to copy over the cookies/auth tokens to start with, but let them diverge from there if the server ends up setting something else or requesting auth

Re: [PR] GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver [arrow]

2023-11-04 Thread via GitHub
lidavidm commented on PR #38404: URL: https://github.com/apache/arrow/pull/38404#issuecomment-1793497422 I filed a general followups issue at #38585 Thanks again @aiguofer! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] [Java][FlightRPC] FlightSQL error: 'Parameter ordinal out of range' executing a prepared stmt with params [arrow]

2023-11-04 Thread via GitHub
lidavidm commented on issue #33475: URL: https://github.com/apache/arrow/issues/33475#issuecomment-1793496522 @jarohen in case this interests you, @aiguofer pushed this across the finish line! However it currently depends on the server providing accurate types for the parameters, so there's

Re: [PR] GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver [arrow]

2023-11-04 Thread via GitHub
lidavidm merged PR #38404: URL: https://github.com/apache/arrow/pull/38404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

Re: [PR] GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver [arrow]

2023-11-04 Thread via GitHub
lidavidm commented on code in PR #38404: URL: https://github.com/apache/arrow/pull/38404#discussion_r1382429577 ## java/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/converter/impl/BinaryAvaticaParameterConverter.java: ## @@ -0,0 +1,49 @@ +/* + * License

Re: [PR] GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver [arrow]

2023-11-04 Thread via GitHub
lidavidm commented on code in PR #38404: URL: https://github.com/apache/arrow/pull/38404#discussion_r1382429671 ## java/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/utils/AvaticaParameterBinder.java: ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache S

Re: [PR] GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver [arrow]

2023-11-04 Thread via GitHub
lidavidm commented on code in PR #38404: URL: https://github.com/apache/arrow/pull/38404#discussion_r1382429603 ## java/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/converter/impl/BinaryAvaticaParameterConverter.java: ## @@ -0,0 +1,49 @@ +/* + * License

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1793494891 Sorry for late replying because I'm playing games :-) Schema is an important part when using parquet, because **parquet only store leaf-nodes**. You can also refer to the code here

Re: [I] [Python][Parquet] Very high memory usage when reading from disk [arrow]

2023-11-04 Thread via GitHub
Hugo-loio commented on issue #38552: URL: https://github.com/apache/arrow/issues/38552#issuecomment-1793492818 I tried setting the `use dictionary = false` option when writing, it might have reduced the size of the file a bit, but the rest of the problems remain. I can't use the `DELTA_BINA

Re: [PR] WIP: feat: emitting partial join results in `HashJoinStream` [arrow-datafusion]

2023-11-04 Thread via GitHub
korowa commented on PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#issuecomment-1793485802 @alamb, got it! Anyway, current version works as expected, so I'll mark it as ready for review. Regarding logic / encapsulation -- I've moved all the index-tracking work t

Re: [I] Support Continue on Error in CSV Reader [arrow-rs]

2023-11-04 Thread via GitHub
tustvold commented on issue #4809: URL: https://github.com/apache/arrow-rs/issues/4809#issuecomment-1793484515 > silent out-of-memory error I think I am missing something here, the reader returns an error and should then no longer be used. I don't see how this is silent? -- This is

Re: [PR] GH-38558: [C++] Fix: null sorting of multiple sort keys. [arrow]

2023-11-04 Thread via GitHub
github-actions[bot] commented on PR #38584: URL: https://github.com/apache/arrow/pull/38584#issuecomment-1793480224 :warning: GitHub issue #38558 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-38558: [C++] Fix: null sorting of multiple sort keys. [arrow]

2023-11-04 Thread via GitHub
Light-City opened a new pull request, #38584: URL: https://github.com/apache/arrow/pull/38584 ### Rationale for this change support multi sortkey nulls first. ``` order by i nulls first, j, k nulls first; ``` The current null sorting only supports all sortkeys, not a ce

Re: [PR] Implement `DISTINCT ON` from Postgres [arrow-datafusion]

2023-11-04 Thread via GitHub
gruuya commented on PR #7981: URL: https://github.com/apache/arrow-datafusion/pull/7981#issuecomment-1793475679 @alamb thanks for bearing with me here! I agree, ideally #8008 gets some kind of a resolution prior to merging this. -- This is an automated message from the Apache Git Service

Re: [PR] Implement `DISTINCT ON` from Postgres [arrow-datafusion]

2023-11-04 Thread via GitHub
gruuya commented on code in PR #7981: URL: https://github.com/apache/arrow-datafusion/pull/7981#discussion_r1382416566 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -60,6 +80,62 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { )?);

Re: [PR] feat: add projection to FilterExec [arrow-datafusion]

2023-11-04 Thread via GitHub
Dandandan commented on PR #7932: URL: https://github.com/apache/arrow-datafusion/pull/7932#issuecomment-1793478351 @junjunjd FYI, I merged and pushed some changes towards pushing projection pushdown. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Implement `DISTINCT ON` from Postgres [arrow-datafusion]

2023-11-04 Thread via GitHub
gruuya commented on code in PR #7981: URL: https://github.com/apache/arrow-datafusion/pull/7981#discussion_r1382416566 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -60,6 +80,62 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { )?);

Re: [PR] Implement `DISTINCT ON` from Postgres [arrow-datafusion]

2023-11-04 Thread via GitHub
gruuya commented on code in PR #7981: URL: https://github.com/apache/arrow-datafusion/pull/7981#discussion_r1382416566 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -60,6 +80,62 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { )?);

Re: [PR] Add mechanism for verifying that source code in documentation is valid [arrow-datafusion]

2023-11-04 Thread via GitHub
andygrove commented on PR #7956: URL: https://github.com/apache/arrow-datafusion/pull/7956#issuecomment-1793476718 @alamb I have addressed the feedback. PTAL when you have time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add mechanism for verifying that source code in documentation is valid [arrow-datafusion]

2023-11-04 Thread via GitHub
andygrove commented on code in PR #7956: URL: https://github.com/apache/arrow-datafusion/pull/7956#discussion_r1382415393 ## docs/build.sh: ## @@ -21,8 +21,14 @@ set -e rm -rf build 2> /dev/null rm -rf temp 2> /dev/null + +# copy source to temp dir Review Comment: I don't

Re: [PR] Implement `DISTINCT ON` from Postgres [arrow-datafusion]

2023-11-04 Thread via GitHub
gruuya commented on code in PR #7981: URL: https://github.com/apache/arrow-datafusion/pull/7981#discussion_r1382413418 ## datafusion/sqllogictest/test_files/distinct_on.slt: ## @@ -0,0 +1,125 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [I] Implement qualified expression alias [arrow-datafusion]

2023-11-04 Thread via GitHub
gruuya commented on issue #8008: URL: https://github.com/apache/arrow-datafusion/issues/8008#issuecomment-1793470221 > Somehow, [Projection](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Projection.html) doesn't seem to have this problem. Yeah I think this problem

Re: [PR] Encapsulate `EquivalenceClass` [arrow-datafusion]

2023-11-04 Thread via GitHub
ozankabak commented on code in PR #8034: URL: https://github.com/apache/arrow-datafusion/pull/8034#discussion_r1382410856 ## datafusion/physical-expr/src/equivalence.rs: ## @@ -20,26 +20,114 @@ use std::hash::Hash; use std::sync::Arc; use crate::expressions::Column; -use cra

Re: [I] Support Continue on Error in CSV Reader [arrow-rs]

2023-11-04 Thread via GitHub
GCHQDeveloper61637 commented on issue #4809: URL: https://github.com/apache/arrow-rs/issues/4809#issuecomment-1793464249 Suggestion #1: if the reader encounters an unrecoverable error, it should be marked internally as being in an unrecoverable state and any attempts to continue reading sho

Re: [PR] Minor: Remove the irrelevant note from the Expression API doc [arrow-datafusion]

2023-11-04 Thread via GitHub
ongchi commented on code in PR #8053: URL: https://github.com/apache/arrow-datafusion/pull/8053#discussion_r1382404409 ## docs/source/user-guide/expressions.md: ## @@ -107,11 +107,6 @@ but these operators always return a `bool` which makes them not work with the ex | x % y, x.

[PR] Minor: Remove the irrelevant note from the Expression API doc [arrow-datafusion]

2023-11-04 Thread via GitHub
ongchi opened a new pull request, #8053: URL: https://github.com/apache/arrow-datafusion/pull/8053 ## Which issue does this PR close? Closes #. ## Rationale for this change This is a doc fix of #7732. ## What changes are included in this PR?

Re: [PR] GH-38255: [Go][C++] Implement Flight SQL Bulk Ingestion [arrow]

2023-11-04 Thread via GitHub
joellubi commented on code in PR #38385: URL: https://github.com/apache/arrow/pull/38385#discussion_r1382393086 ## cpp/src/arrow/flight/sql/client.cc: ## @@ -256,6 +256,88 @@ arrow::Result FlightSqlClient::ExecuteSubstraitUpdate( return update_result.record_count(); } +ar

Re: [I] [C++][JSON] kMaxParserNumRows Value Increase/Removal [arrow]

2023-11-04 Thread via GitHub
Ox0400 commented on issue #28994: URL: https://github.com/apache/arrow/issues/28994#issuecomment-1793439489 > I'm not aware of anyone getting ready to work on this issue. If you wanted to open a PR, I could give you a review Hi @bkietz, I am opened an PR:https://github.com/apache/arro

Re: [PR] General approach for Array replace [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on PR #8050: URL: https://github.com/apache/arrow-datafusion/pull/8050#issuecomment-1793431025 > We can probably do the similar improvement for array_repeat / append / prepend Yes, please! I'll make my cleanup PR later today and we can try to make all of them better

Re: [PR] New an env ARROW_JSON_MAX_NUM_ROWS to overwrite default kMaxParserNum… [arrow]

2023-11-04 Thread via GitHub
github-actions[bot] commented on PR #38582: URL: https://github.com/apache/arrow/pull/38582#issuecomment-1793426692 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] New an env ARROW_JSON_MAX_NUM_ROWS to overwrite default kMaxParserNum… [arrow]

2023-11-04 Thread via GitHub
Ox0400 opened a new pull request, #38582: URL: https://github.com/apache/arrow/pull/38582 https://github.com/apache/arrow/issues/28994 ### Rationale for this change ### What changes are included in this PR? ### Are these changes tested?

Re: [PR] Implement `DISTINCT ON` from Postgres [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on code in PR #7981: URL: https://github.com/apache/arrow-datafusion/pull/7981#discussion_r1382374720 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -60,6 +80,62 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { )?);

Re: [I] Implement qualified expression alias [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on issue #8008: URL: https://github.com/apache/arrow-datafusion/issues/8008#issuecomment-1793417608 > there in lies the problem—all logical plans with a schema use exprlist_to_fields to generate the initial schema, however this function will always [result](https://git

Re: [I] [EPIC] Unify Function Interface (remove `BuiltInScalarFunction`) [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on issue #8045: URL: https://github.com/apache/arrow-datafusion/issues/8045#issuecomment-1793411778 Here is a PR that shows what DataFusion might look like after removing `BuiltInScalarFunction` and moving everything to `ScalarUDF`: https://github.com/apache/arrow-datafusio

Re: [PR] RFC: Demonstrate what a function package might look like -- encoding expressions [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on PR #8046: URL: https://github.com/apache/arrow-datafusion/pull/8046#issuecomment-1793411566 @2010YOUY01 and @viirya I wonder if you have any thoughts on this approach / proposal? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Execute LogicalPlan on DBMS directly [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on issue #970: URL: https://github.com/apache/arrow-datafusion/issues/970#issuecomment-1793411154 > Would this be a sensible approach to take? The basic idea seems reasonable to me. Given your description it seems like the same thing could be accomplished toda

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-04 Thread via GitHub
tschaub commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1793410455 Thank you for looking into this and finding the issue, @mapleFU. My assumption was that the last argument to the `pqarrow.NewArrowColumnWriter` function was the Arrow column index i

Re: [PR] WIP: feat: emitting partial join results in `HashJoinStream` [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#issuecomment-1793409500 > definitely worthwhile -- I'll look for an options of separating state from other join internals. Thank you . this particular issue has become a higher priority for

Re: [PR] Minor: Add more documentation about Partitioning [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on code in PR #8022: URL: https://github.com/apache/arrow-datafusion/pull/8022#discussion_r1382370566 ## datafusion/physical-expr/src/partitioning.rs: ## @@ -15,14 +15,94 @@ // specific language governing permissions and limitations // under the License. -//!

[PR] Minor: improve documentation for IsNotNull, DISTINCT, etc [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb opened a new pull request, #8052: URL: https://github.com/apache/arrow-datafusion/pull/8052 ## Which issue does this PR close? N/A ## Rationale for this change The question came up on ASF Slack about what `Expr::IsNotFalse(e)` meant, and since it is subtle I had to loo

Re: [PR] Improve comments for `PartitionSearchMode` struct [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb merged PR #8047: URL: https://github.com/apache/arrow-datafusion/pull/8047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] [Python] pyarrow.parquet.read_table with filters is broken for timezone aware datetime since 13.0.0 release [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on issue #37355: URL: https://github.com/apache/arrow/issues/37355#issuecomment-1793400932 Could we close this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Specialized / Pre-compiled / Prepared ScalarUDFs [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on issue #8051: URL: https://github.com/apache/arrow-datafusion/issues/8051#issuecomment-1793400870 One way to achieve this might be a [PhysicalOptimizerRule](https://docs.rs/datafusion/latest/datafusion/physical_optimizer/optimizer/trait.PhysicalOptimizerRule.html#) that r

Re: [I] [EPIC] Unify Function Interface (remove `BuiltInScalarFunction`) [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on issue #8045: URL: https://github.com/apache/arrow-datafusion/issues/8045#issuecomment-1793400963 > Sorry, what I mean is that it would be useful to be able to serialize constant parameters into the user-defined scalar function themselves rather than pass them in as expre

Re: [I] [Python] Using unify_schema() during schema evolution fails [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on issue #37898: URL: https://github.com/apache/arrow/issues/37898#issuecomment-1793400603 @PoojaRavi1105 1. Currently parquet dataset doesn't support iceberg style schema evolution using `unify_schema` 2. But when you set the schema in dataset explicitly yoursel

Re: [I] [C++][Parquet] Slow column reading from multi-column parquet files [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on issue #38149: URL: https://github.com/apache/arrow/issues/38149#issuecomment-1793400232 Sorry for late replying. Have you solved this problem? When column grows, the metadata will grow. The metadata is thrift, and thrift need to deserialize all data. I must

Re: [I] Pre-compiled / Prepared ScalarUDFs [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on issue #8051: URL: https://github.com/apache/arrow-datafusion/issues/8051#issuecomment-1793399666 @thinkharderdev suggests it would be useful to be able to serialize constant parameters into the user-defined scalar function themselves rather than pass them in as e

[I] Pre-compiled / Prepared ScalarUDFs [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb opened a new issue, #8051: URL: https://github.com/apache/arrow-datafusion/issues/8051 ### Is your feature request related to a problem or challenge? Currently, scalar UDF functions can not be "specialized" ```select SELECT * FROM t where my_matcher(t.column, '[a-z].*')

Re: [I] [Python][Parquet] Very high memory usage when reading from disk [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on issue #38552: URL: https://github.com/apache/arrow/issues/38552#issuecomment-1793397893 https://github.com/apache/arrow/issues/38245 Would you mind first check the analysis here? I guess the reason is similiar -- This is an automated message from the Apache Git

Re: [PR] Encapsulate `EquivalenceClass` [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb commented on code in PR #8034: URL: https://github.com/apache/arrow-datafusion/pull/8034#discussion_r1382365034 ## datafusion/physical-expr/src/equivalence.rs: ## @@ -20,26 +20,114 @@ use std::hash::Hash; use std::sync::Arc; use crate::expressions::Column; -use crate::

Re: [PR] Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, ... [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb merged PR #8040: URL: https://github.com/apache/arrow-datafusion/pull/8040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Encapsulate `ProjectionMapping` as a struct [arrow-datafusion]

2023-11-04 Thread via GitHub
alamb merged PR #8033: URL: https://github.com/apache/arrow-datafusion/pull/8033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-04 Thread via GitHub
github-actions[bot] commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1793394850 :warning: GitHub issue #38503 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1793394944 @zeroshade @tschaub I've updated the sample here. Also, we may need to check the type for writer. Any advice is welcomed. -- This is an automated message from the Apache Git Serv

[PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-04 Thread via GitHub
mapleFU opened a new pull request, #38581: URL: https://github.com/apache/arrow/pull/38581 ### Rationale for this change Currently, `ArrowColumnWriter` seems not having bug. But looks weird for testing. This patch enhance the style for it. ### What change

Re: [I] [Go][Parquet] Trouble using the C++ reader to read a Parquet file written with the Go writer [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on issue #38503: URL: https://github.com/apache/arrow/issues/38503#issuecomment-1793392796 After rethink the impl, I found it's the impl's problem rather than a bug. ``` ctx := context.Background() numFields := len(arrowReader.Manifest.Fields)

Re: [I] [Go][Parquet] Trouble using the C++ reader to read a Parquet file written with the Go writer [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on issue #38503: URL: https://github.com/apache/arrow/issues/38503#issuecomment-1793392861 And currently, the generated file is a bad file here... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] GH-38315: [Dev][CI] autotune needs additional permissions to push to PR branches [arrow]

2023-11-04 Thread via GitHub
thisisnic commented on PR #38523: URL: https://github.com/apache/arrow/pull/38523#issuecomment-1793392644 If this is a no-go, should I just remove both of these in this PR? They were nice to have, but as a maintainer reviewing PRs I'm happy enough to just pull locally and style or rebase so

Re: [PR] GH-38516: [Go][Parquet] Increment the number of rows written when appending a new row group [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on PR #38517: URL: https://github.com/apache/arrow/pull/38517#issuecomment-1793382029 cc @zeroshade -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Execute LogicalPlan on DBMS directly [arrow-datafusion]

2023-11-04 Thread via GitHub
backkem commented on issue #970: URL: https://github.com/apache/arrow-datafusion/issues/970#issuecomment-1793381505 I took a look at how Presto handles this. Presto uses the concept of a `Connector` to represent remote data sources. DF's `TableProviderFactory` is similar. Their `TableHandl

Re: [PR] Improve comments for `PartitionSearchMode` struct [arrow-datafusion]

2023-11-04 Thread via GitHub
ozankabak commented on PR #8047: URL: https://github.com/apache/arrow-datafusion/pull/8047#issuecomment-1793380626 BTW this is a first step towards improving comments and finding a better name/place for the `PartitionSearchMode` struct as discussed in https://github.com/apache/arrow-datafu

[PR] General approach for Array replace [arrow-datafusion]

2023-11-04 Thread via GitHub
jayzhan211 opened a new pull request, #8050: URL: https://github.com/apache/arrow-datafusion/pull/8050 ## Which issue does this PR close? Closes #7988. ## Rationale for this change Find a general approach that avoid downcast Array, largely reduce the code

Re: [I] [Go][Parquet] Trouble using the C++ reader to read a Parquet file written with the Go writer [arrow]

2023-11-04 Thread via GitHub
mapleFU commented on issue #38503: URL: https://github.com/apache/arrow/issues/38503#issuecomment-1793368912 Updated: I think C++ reader checks max-rep-level, and it's 1. (and def-max is 3). So it report the error. -- This is an automated message from the Apache Git Service. To respond to