Re: [PR] build(deps): bump pyo3-build-config from 0.20.0 to 0.20.2 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
dependabot[bot] closed pull request #555: build(deps): bump pyo3-build-config from 0.20.0 to 0.20.2 URL: https://github.com/apache/arrow-datafusion-python/pull/555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] build(deps): bump pyo3-build-config from 0.20.0 to 0.20.2 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
dependabot[bot] commented on PR #555: URL: https://github.com/apache/arrow-datafusion-python/pull/555#issuecomment-1879977968 Looks like pyo3-build-config is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] build(deps): bump pyo3 from 0.20.0 to 0.20.2 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
viirya merged PR #557: URL: https://github.com/apache/arrow-datafusion-python/pull/557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr..

Re: [I] Refactor size estimation of Hashset into a function [arrow-datafusion]

2024-01-06 Thread via GitHub
yyy1000 commented on issue #8764: URL: https://github.com/apache/arrow-datafusion/issues/8764#issuecomment-1879975135 I'd like to work on this as a good start. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] GH-39289: [JS] Add types to exports [arrow]

2024-01-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39475: URL: https://github.com/apache/arrow/pull/39475#issuecomment-1879970543 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 17b946c745cbcc4ee62a8607301db1939f364c68. There were 2

Re: [I] [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType [arrow]

2024-01-06 Thread via GitHub
mapleFU commented on issue #39489: URL: https://github.com/apache/arrow/issues/39489#issuecomment-1879966745 Thanks for point this out! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[I] Confusion over Date64 array values [arrow-rs]

2024-01-06 Thread via GitHub
Jefffrey opened a new issue, #5288: URL: https://github.com/apache/arrow-rs/issues/5288 **Which part is this question about** Date64 array values. **Describe your question** Docs for Date64 type states: https://github.com/apache/arrow-rs/blob/a61e824abdd7b

Re: [PR] fix: struct field don't push down to TableScan [arrow-datafusion]

2024-01-06 Thread via GitHub
haohuaijin commented on code in PR #8774: URL: https://github.com/apache/arrow-datafusion/pull/8774#discussion_r1443932497 ## datafusion/optimizer/src/optimize_projections.rs: ## @@ -677,6 +676,22 @@ fn outer_columns_helper(expr: &Expr, columns: &mut HashSet) -> bool {

Re: [PR] fix: struct field don't push down to TableScan [arrow-datafusion]

2024-01-06 Thread via GitHub
haohuaijin commented on code in PR #8774: URL: https://github.com/apache/arrow-datafusion/pull/8774#discussion_r1443932497 ## datafusion/optimizer/src/optimize_projections.rs: ## @@ -677,6 +676,22 @@ fn outer_columns_helper(expr: &Expr, columns: &mut HashSet) -> bool {

Re: [PR] build(deps): bump async-trait from 0.1.74 to 0.1.77 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
viirya merged PR #556: URL: https://github.com/apache/arrow-datafusion-python/pull/556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr..

Re: [I] writing to partitioned table uses the wrong column as partition key [arrow-datafusion]

2024-01-06 Thread via GitHub
marvinlanhenke commented on issue #7892: URL: https://github.com/apache/arrow-datafusion/issues/7892#issuecomment-1879947879 @devinjdangelo thank you so much for the explanation. I wasn't aware of the convention [1] and thus not sure if the schema handling was intended this way. Now, it

Re: [PR] build(deps): bump tokio from 1.35.0 to 1.35.1 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
viirya merged PR #558: URL: https://github.com/apache/arrow-datafusion-python/pull/558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr..

Re: [PR] build(deps): bump syn from 2.0.41 to 2.0.43 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
viirya merged PR #559: URL: https://github.com/apache/arrow-datafusion-python/pull/559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr..

Re: [PR] GH-39047: [JS] Enable test for generate_primitive_large_offsets_case [arrow]

2024-01-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39470: URL: https://github.com/apache/arrow/pull/39470#issuecomment-1879926519 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 694fd7ed89e9e5dd02af1ddf84dd098de87bbcea. There were no

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-06 Thread via GitHub
likun61 commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443894967 ## cpp/src/gandiva/regex_functions_holder.cc: ## @@ -275,4 +275,78 @@ const char* ExtractHolder::operator()(ExecutionContext* ctx, const char* user_in return re

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-06 Thread via GitHub
likun61 commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443895017 ## cpp/src/gandiva/regex_functions_holder_test.cc: ## @@ -635,4 +635,93 @@ TEST_F(TestExtractHolder, TestErrorWhileBuildingHolder) { execution_context_.Reset();

Re: [PR] GH-39488: [Ruby] Add support for ChunkedArray in Ractor [arrow]

2024-01-06 Thread via GitHub
github-actions[bot] commented on PR #39490: URL: https://github.com/apache/arrow/pull/39490#issuecomment-1879920289 :warning: GitHub issue #39488 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-39488: [Ruby] Add support for ChunkedArray in Ractor [arrow]

2024-01-06 Thread via GitHub
kou opened a new pull request, #39490: URL: https://github.com/apache/arrow/pull/39490 ### Rationale for this change We can't use `@cache ||= build_cache` idiom in Ractor because Ractor requires that shared objects are immutable. ### What changes are included in this PR?

Re: [I] [C++] Deprecate Scalar::CastTo [arrow]

2024-01-06 Thread via GitHub
llama90 commented on issue #39182: URL: https://github.com/apache/arrow/issues/39182#issuecomment-1879917857 ### Progress Update If we remove the legacy `CastTo`, we must also update the `ToString` function to call the new `Cast` function. I suspect that the following unit tes

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-06 Thread via GitHub
likun61 commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443895017 ## cpp/src/gandiva/regex_functions_holder_test.cc: ## @@ -635,4 +635,93 @@ TEST_F(TestExtractHolder, TestErrorWhileBuildingHolder) { execution_context_.Reset();

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-06 Thread via GitHub
likun61 commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443894967 ## cpp/src/gandiva/regex_functions_holder.cc: ## @@ -275,4 +275,78 @@ const char* ExtractHolder::operator()(ExecutionContext* ctx, const char* user_in return re

Re: [I] Convert list operator to function in sql to LogicExpr stage [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on issue #8506: URL: https://github.com/apache/arrow-datafusion/issues/8506#issuecomment-1879913865 I think there is a little issue with the current design. We rewrite the `| |` operator to function after the logical plan is built. Before OperatorToFunction is applie

Re: [PR] GH-36612: [C++] Ensure compatibility between std::span and arrow::util::span [arrow]

2024-01-06 Thread via GitHub
Divyansh200102 closed pull request #39123: GH-36612: [C++] Ensure compatibility between std::span and arrow::util::span URL: https://github.com/apache/arrow/pull/39123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Improve `array_concat` signature for null and empty array [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on code in PR #8594: URL: https://github.com/apache/arrow-datafusion/pull/8594#discussion_r1443882484 ## datafusion/expr/src/signature.rs: ## @@ -122,6 +122,9 @@ pub enum TypeSignature { /// List dimension of the List/LargeList is equivalent to the numb

Re: [PR] Improve `array_concat` signature for null and empty array [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on code in PR #8594: URL: https://github.com/apache/arrow-datafusion/pull/8594#discussion_r1443882644 ## datafusion/expr/src/signature.rs: ## @@ -122,6 +122,9 @@ pub enum TypeSignature { /// List dimension of the List/LargeList is equivalent to the numb

Re: [PR] Improve `array_concat` signature for null and empty array [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on code in PR #8594: URL: https://github.com/apache/arrow-datafusion/pull/8594#discussion_r1443882484 ## datafusion/expr/src/signature.rs: ## @@ -122,6 +122,9 @@ pub enum TypeSignature { /// List dimension of the List/LargeList is equivalent to the numb

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443880303 ## datafusion/common/src/scalar.rs: ## @@ -142,13 +143,13 @@ pub enum ScalarValue { /// Fixed size list scalar. /// /// The array must be a F

Re: [PR] GH-38255: [Go][C++] Implement Flight SQL Bulk Ingestion [arrow]

2024-01-06 Thread via GitHub
joellubi commented on PR #38385: URL: https://github.com/apache/arrow/pull/38385#issuecomment-1879892514 @lidavidm @emkornfield Following up on some thoughts I had on the general approach of this PR. This discussion above regarding splitting up DDL and Write operations got me looking at the

Re: [PR] GH-39366: [JS] Add largeUtf8 to benchmark [arrow]

2024-01-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39367: URL: https://github.com/apache/arrow/pull/39367#issuecomment-1879891518 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 278bcab07709d8d896d7fe7df981482304bd9fc5. There were no

Re: [PR] GH-39355: [Java] Improve JdbcConsumer exceptions [arrow]

2024-01-06 Thread via GitHub
aiguofer commented on PR #39356: URL: https://github.com/apache/arrow/pull/39356#issuecomment-1879887403 @lidavidm done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] fix(dev/release,go): ensure temporary directory removable [arrow-adbc]

2024-01-06 Thread via GitHub
kou opened a new pull request, #1438: URL: https://github.com/apache/arrow-adbc/pull/1438 Fixes #1437. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[PR] fix(dev/release,glib): set library path to run example [arrow-adbc]

2024-01-06 Thread via GitHub
kou opened a new pull request, #1436: URL: https://github.com/apache/arrow-adbc/pull/1436 Fixes #1435. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
rspears74 commented on PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#issuecomment-1879877160 Not sure why the MSRV check is failing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] GH-39049: [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test [arrow]

2024-01-06 Thread via GitHub
kou merged PR #39362: URL: https://github.com/apache/arrow/pull/39362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-39049: [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test [arrow]

2024-01-06 Thread via GitHub
kou commented on code in PR #39362: URL: https://github.com/apache/arrow/pull/39362#discussion_r1443871156 ## cpp/src/arrow/compute/kernels/scalar_cast_dictionary.cc: ## @@ -77,17 +85,24 @@ Status CastToDictionary(KernelContext* ctx, const ExecSpan& batch, ExecResult* o retu

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
wjones127 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443869268 ## datafusion/common/src/scalar.rs: ## @@ -2433,11 +2455,14 @@ impl ScalarValue { ScalarValue::LargeBinary(val) => { eq_array_p

Re: [PR] GH-39049: [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test [arrow]

2024-01-06 Thread via GitHub
llama90 commented on code in PR #39362: URL: https://github.com/apache/arrow/pull/39362#discussion_r1443869238 ## cpp/src/arrow/compute/kernels/scalar_cast_dictionary.cc: ## @@ -77,17 +85,24 @@ Status CastToDictionary(KernelContext* ctx, const ExecSpan& batch, ExecResult* o

Re: [PR] GH-39303: [Archery][Benchmarking] Allow setting C++ repetition min time [arrow]

2024-01-06 Thread via GitHub
kou merged PR #39324: URL: https://github.com/apache/arrow/pull/39324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-39303: [Archery][Benchmarking] Allow setting C++ repetition min time [arrow]

2024-01-06 Thread via GitHub
kou commented on code in PR #39324: URL: https://github.com/apache/arrow/pull/39324#discussion_r1443863862 ## dev/archery/requirements-test.txt: ## @@ -0,0 +1,2 @@ +pytest Review Comment: > Well, apparently requirements files are excluded from the RAT checks? In gener

Re: [PR] Support Reading and Writing Extension FileTypes [arrow-datafusion]

2024-01-06 Thread via GitHub
devinjdangelo commented on PR #8667: URL: https://github.com/apache/arrow-datafusion/pull/8667#issuecomment-1879836968 I've been thinking about this more, and I think we need a similar interface to how we register a custom `TableProvider` at the `SessionContext` level. The idea being we'd

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
rspears74 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443860094 ## datafusion/common/src/scalar.rs: ## @@ -142,13 +143,13 @@ pub enum ScalarValue { /// Fixed size list scalar. /// /// The array must be a Fi

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
rspears74 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443859383 ## datafusion/common/src/scalar.rs: ## @@ -2433,11 +2455,14 @@ impl ScalarValue { ScalarValue::LargeBinary(val) => { eq_array_p

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
rspears74 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443859383 ## datafusion/common/src/scalar.rs: ## @@ -2433,11 +2455,14 @@ impl ScalarValue { ScalarValue::LargeBinary(val) => { eq_array_p

Re: [I] writing to partitioned table uses the wrong column as partition key [arrow-datafusion]

2024-01-06 Thread via GitHub
devinjdangelo commented on issue #7892: URL: https://github.com/apache/arrow-datafusion/issues/7892#issuecomment-1879833196 Hey @marvinlanhenke , thanks for digging into this. > I am not sure if I am correct, and why the partition_cols need to be excluded and then added to the end l

Re: [PR] GH-39225: [GLib] Use Cast() instaed of CastTo [arrow]

2024-01-06 Thread via GitHub
kou merged PR #39228: URL: https://github.com/apache/arrow/pull/39228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-39049: [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test [arrow]

2024-01-06 Thread via GitHub
kou commented on code in PR #39362: URL: https://github.com/apache/arrow/pull/39362#discussion_r1443856865 ## cpp/src/arrow/compute/kernels/scalar_cast_dictionary.cc: ## @@ -77,17 +85,24 @@ Status CastToDictionary(KernelContext* ctx, const ExecSpan& batch, ExecResult* o retu

Re: [PR] GH-39398: [C++][Parquet] DNM: benchmark for readLevels [arrow]

2024-01-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39486: URL: https://github.com/apache/arrow/pull/39486#issuecomment-1879831253 Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit 0529cce31a71d4c2f8236994efa94e7817b608aa. There were 3

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
wjones127 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443855691 ## datafusion/common/src/scalar.rs: ## @@ -942,9 +958,9 @@ impl ScalarValue { ScalarValue::Binary(_) => DataType::Binary, ScalarVal

Re: [PR] Transform with payload [arrow-datafusion]

2024-01-06 Thread via GitHub
ozankabak commented on PR #8664: URL: https://github.com/apache/arrow-datafusion/pull/8664#issuecomment-1879821075 Two pieces of news from our side: 1. I think I figured out a possible way to simplify the overall `TreeNode`-related code and reduce code duplication significantly. I plan t

Re: [PR] Add `schema_err!` error macros with optional backtrace [arrow-datafusion]

2024-01-06 Thread via GitHub
comphead merged PR #8620: URL: https://github.com/apache/arrow-datafusion/pull/8620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[PR] build(deps): bump syn from 2.0.41 to 2.0.43 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
dependabot[bot] opened a new pull request, #559: URL: https://github.com/apache/arrow-datafusion-python/pull/559 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.41 to 2.0.43. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.43

[PR] build(deps): bump tokio from 1.35.0 to 1.35.1 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
dependabot[bot] opened a new pull request, #558: URL: https://github.com/apache/arrow-datafusion-python/pull/558 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.35.0 to 1.35.1. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. T

[PR] build(deps): bump pyo3 from 0.20.0 to 0.20.2 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
dependabot[bot] opened a new pull request, #557: URL: https://github.com/apache/arrow-datafusion-python/pull/557 Bumps [pyo3](https://github.com/pyo3/pyo3) from 0.20.0 to 0.20.2. Release notes Sourced from https://github.com/pyo3/pyo3/releases";>pyo3's releases. PyO3 0.20.2

[PR] build(deps): bump async-trait from 0.1.74 to 0.1.77 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
dependabot[bot] opened a new pull request, #556: URL: https://github.com/apache/arrow-datafusion-python/pull/556 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.74 to 0.1.77. Release notes Sourced from https://github.com/dtolnay/async-trait/releases";>async-tra

[PR] build(deps): bump pyo3-build-config from 0.20.0 to 0.20.2 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
dependabot[bot] opened a new pull request, #555: URL: https://github.com/apache/arrow-datafusion-python/pull/555 Bumps [pyo3-build-config](https://github.com/pyo3/pyo3) from 0.20.0 to 0.20.2. Release notes Sourced from https://github.com/pyo3/pyo3/releases";>pyo3-build-config's re

[PR] build(deps): bump futures from 0.3.29 to 0.3.30 [arrow-datafusion-python]

2024-01-06 Thread via GitHub
dependabot[bot] opened a new pull request, #554: URL: https://github.com/apache/arrow-datafusion-python/pull/554 Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.29 to 0.3.30. Release notes Sourced from https://github.com/rust-lang/futures-rs/releases";>futures's

Re: [PR] ci: speed up win64 test [arrow-datafusion]

2024-01-06 Thread via GitHub
comphead merged PR #8728: URL: https://github.com/apache/arrow-datafusion/pull/8728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

Re: [I] [C++][Compute] Add residual predicate support to new (Swiss) hash join [arrow]

2024-01-06 Thread via GitHub
zanmato1984 commented on issue #20339: URL: https://github.com/apache/arrow/issues/20339#issuecomment-1879783832 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] GH-20339: [C++] Add residual predicate support to swiss join [arrow]

2024-01-06 Thread via GitHub
github-actions[bot] commented on PR #39487: URL: https://github.com/apache/arrow/pull/39487#issuecomment-1879783732 :warning: GitHub issue #20339 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] Materialize Dictionaries in Group Keys [arrow-datafusion]

2024-01-06 Thread via GitHub
qrilka commented on issue #7647: URL: https://github.com/apache/arrow-datafusion/issues/7647#issuecomment-1879783659 @alamb what could be some evidence for 3.2? Is there anything in the code base or maybe in some other ticket? The plan you show makes total sense -- This is an automated m

[PR] GH-20339: [C++] Add residual predicate support to swiss join [arrow]

2024-01-06 Thread via GitHub
zanmato1984 opened a new pull request, #39487: URL: https://github.com/apache/arrow/pull/39487 ### Rationale for this change ### What changes are included in this PR? ### Are these changes tested? ### Are there any user-facing changes?

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
rspears74 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443800597 ## datafusion/common/src/scalar.rs: ## @@ -2937,13 +3015,33 @@ impl fmt::Display for ScalarValue { )?, None => write!(f, "N

Re: [PR] Implement trait based API for define AggregateUDF [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb commented on PR #8733: URL: https://github.com/apache/arrow-datafusion/pull/8733#issuecomment-1879744365 The CI test failure is unrelated to this PR -- and @jayzhan211 fixed it in https://github.com/apache/arrow-datafusion/pull/8775. The CI should be clean with a merge up from main

Re: [I] Use correct attribution in footer of documentation pages [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb closed issue #8755: Use correct attribution in footer of documentation pages URL: https://github.com/apache/arrow-datafusion/issues/8755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Add Apache attribution to site footer [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb merged PR #8760: URL: https://github.com/apache/arrow-datafusion/pull/8760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] [MINOR] Add logo source files [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb merged PR #8762: URL: https://github.com/apache/arrow-datafusion/pull/8762 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] [MINOR] Add logo source files [arrow-datafusion]

2024-01-06 Thread via GitHub
andygrove commented on PR #8762: URL: https://github.com/apache/arrow-datafusion/pull/8762#issuecomment-1879740921 > One thing I wondered about is if you meant to commit `docs/logos/DataFusion-LogoAndColorPaletteExploration_v01.pdf` as well -- it looks quite cool Motivation for addi

Re: [I] [JS] What is best practice for connecting to an arrow instance in javascript? [arrow]

2024-01-06 Thread via GitHub
jay-bulk commented on issue #36625: URL: https://github.com/apache/arrow/issues/36625#issuecomment-1879727838 I'm not disagreeing that this should be closed, mind you. I've abandoned direct use of arrow flight altogether. I use dremio (built on arrow flight) and instead of calling the arrow

Re: [PR] GH-39259: [JS] Remove getByteLength [arrow]

2024-01-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39260: URL: https://github.com/apache/arrow/pull/39260#issuecomment-1879722890 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit afb40a9f5a33802897e1d5bae8305c81da7beee1. There were no

Re: [PR] Minor: Improve library docs to mention TreeNode, ExprSimplifier, PruningPredicate and cp_solver [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb merged PR #8749: URL: https://github.com/apache/arrow-datafusion/pull/8749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Minor: Improve library docs to mention TreeNode, ExprSimplifier, PruningPredicate and cp_solver [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb commented on PR #8749: URL: https://github.com/apache/arrow-datafusion/pull/8749#issuecomment-1879717762 Thank you @wjones127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] ci: speed up win64 test [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb commented on code in PR #8728: URL: https://github.com/apache/arrow-datafusion/pull/8728#discussion_r1443779796 ## .github/workflows/rust.yml: ## @@ -310,11 +310,10 @@ jobs: cd datafusion-cli cargo test --lib --tests --bins --all-features env

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Jan 1, 2024 [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb commented on issue #8704: URL: https://github.com/apache/arrow-datafusion/issues/8704#issuecomment-1879716970 DataFUsion: - [x] https://github.com/apache/arrow-datafusion/pull/8774#pullrequestreview-1807481671 - [ ] https://github.com/apache/arrow-datafusion/pull/8775 - [ ] h

Re: [PR] fix: struct field don't push down to TableScan [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb commented on code in PR #8774: URL: https://github.com/apache/arrow-datafusion/pull/8774#discussion_r1443779477 ## datafusion/optimizer/src/optimize_projections.rs: ## @@ -677,6 +676,22 @@ fn outer_columns_helper(expr: &Expr, columns: &mut HashSet) -> bool {

Re: [PR] Minor: Fix incorrect indices for hashing struct [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb commented on code in PR #8775: URL: https://github.com/apache/arrow-datafusion/pull/8775#discussion_r1443778741 ## datafusion/sqllogictest/test_files/dictionary.slt: ## @@ -148,7 +148,7 @@ select count(*) from m1 where tag_id = '1000' and time < '2024-01-03T14:46:35+01 -

Re: [PR] Minor: Fix incorrect indices for hashing struct [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb merged PR #8775: URL: https://github.com/apache/arrow-datafusion/pull/8775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Minor: Fix flake in newly added dictionary.slt test [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb closed pull request #8769: Minor: Fix flake in newly added dictionary.slt test URL: https://github.com/apache/arrow-datafusion/pull/8769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Get MIRI running against parquet crate [arrow-rs]

2024-01-06 Thread via GitHub
alamb commented on issue #614: URL: https://github.com/apache/arrow-rs/issues/614#issuecomment-1879713944 THanks for trying @Jefffrey -- I agree I don't know what that error means, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] GH-39001: [Java] Modularize remaining modules [arrow]

2024-01-06 Thread via GitHub
jduo commented on PR #39221: URL: https://github.com/apache/arrow/pull/39221#issuecomment-1879695298 Doc changes are required as different modules require more command line arguments. I'll update the docs. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] GH-38997: [Java] Modularize format and vector [arrow]

2024-01-06 Thread via GitHub
jduo commented on PR #38995: URL: https://github.com/apache/arrow/pull/38995#issuecomment-1879694913 Note that this one didn't require doc changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-06 Thread via GitHub
alamb commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443768896 ## datafusion/common/src/scalar.rs: ## @@ -2937,13 +3015,33 @@ impl fmt::Display for ScalarValue { )?, None => write!(f, "NULL"

Re: [PR] GH-39398: [C++][Parquet] DNM: benchmark for readLevels [arrow]

2024-01-06 Thread via GitHub
ursabot commented on PR #39486: URL: https://github.com/apache/arrow/pull/39486#issuecomment-1879664770 Benchmark runs are scheduled for commit 0529cce31a71d4c2f8236994efa94e7817b608aa. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be po

Re: [PR] GH-39398: [C++][Parquet] DNM: benchmark for readLevels [arrow]

2024-01-06 Thread via GitHub
pitrou commented on PR #39486: URL: https://github.com/apache/arrow/pull/39486#issuecomment-1879664678 @ursabot please benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] GH-39477: [JS] remove esModuleInterop [arrow]

2024-01-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39478: URL: https://github.com/apache/arrow/pull/39478#issuecomment-1879658361 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 9b931af14e5a710cba0aaa6b899e2ca696bfd785. There was 1 b

Re: [PR] Minor: Fix incorrect indices for hashing struct [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on code in PR #8775: URL: https://github.com/apache/arrow-datafusion/pull/8775#discussion_r1443729585 ## datafusion/sqllogictest/test_files/dictionary.slt: ## @@ -148,7 +148,7 @@ select count(*) from m1 where tag_id = '1000' and time < '2024-01-03T14:46:35+

Re: [PR] Minor: Fix incorrect indices for hashing struct [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on code in PR #8775: URL: https://github.com/apache/arrow-datafusion/pull/8775#discussion_r1443729585 ## datafusion/sqllogictest/test_files/dictionary.slt: ## @@ -148,7 +148,7 @@ select count(*) from m1 where tag_id = '1000' and time < '2024-01-03T14:46:35+

Re: [I] Improve Parallel Reading (CSV, JSON) / Help Wanted [arrow-datafusion]

2024-01-06 Thread via GitHub
marvinlanhenke commented on issue #8723: URL: https://github.com/apache/arrow-datafusion/issues/8723#issuecomment-1879649792 > That might not be the right structure, but I am trying to give you the flavor of what encapsulating the complexity might look like @alamb ...thanks, this

Re: [I] writing to partitioned table uses the wrong column as partition key [arrow-datafusion]

2024-01-06 Thread via GitHub
marvinlanhenke commented on issue #7892: URL: https://github.com/apache/arrow-datafusion/issues/7892#issuecomment-1879646811 ...while I was looking into: #8493 trying to understand how the hive-partitioned writes are implemented I came across the same issue, having it select the wrong colu

Re: [PR] fix: struct field don't push down to TableScan [arrow-datafusion]

2024-01-06 Thread via GitHub
simonvandel commented on code in PR #8774: URL: https://github.com/apache/arrow-datafusion/pull/8774#discussion_r1443708093 ## datafusion/sqllogictest/test_files/dictionary.slt: ## @@ -149,7 +149,7 @@ select count(*) from m1 where tag_id = '1000' and time < '2024-01-03T14:46:35

Re: [PR] Minor: Fix incorrect indices for hashing struct [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on code in PR #8775: URL: https://github.com/apache/arrow-datafusion/pull/8775#discussion_r1443700025 ## datafusion/common/src/hash_utils.rs: ## @@ -214,22 +214,19 @@ fn hash_struct_array( hashes_buffer: &mut [u64], ) -> Result<()> { let nulls = a

Re: [PR] Minor: Fix incorrect indices for hashing struct [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 commented on code in PR #8775: URL: https://github.com/apache/arrow-datafusion/pull/8775#discussion_r1443699399 ## datafusion/common/src/hash_utils.rs: ## @@ -214,22 +214,19 @@ fn hash_struct_array( hashes_buffer: &mut [u64], ) -> Result<()> { let nulls = a

[PR] Minor: Fix incorrect indices for hashing struct [arrow-datafusion]

2024-01-06 Thread via GitHub
jayzhan211 opened a new pull request, #8775: URL: https://github.com/apache/arrow-datafusion/pull/8775 ## Which issue does this PR close? Bug in https://github.com/apache/arrow-datafusion/pull/8552 ## Rationale for this change ## What changes are included

Re: [PR] fix: struct field don't push down to TableScan [arrow-datafusion]

2024-01-06 Thread via GitHub
haohuaijin commented on PR #8774: URL: https://github.com/apache/arrow-datafusion/pull/8774#issuecomment-1879639150 the root reason maybe in below code https://github.com/apache/arrow-datafusion/blob/4e4059a68455fbc14f04902c76acbcd258b7f2ef/datafusion/optimizer/src/optimize_projections.r

Re: [PR] fix: struct field don't push down to TableScan [arrow-datafusion]

2024-01-06 Thread via GitHub
haohuaijin commented on code in PR #8774: URL: https://github.com/apache/arrow-datafusion/pull/8774#discussion_r1443691448 ## datafusion/sqllogictest/test_files/dictionary.slt: ## @@ -149,7 +149,7 @@ select count(*) from m1 where tag_id = '1000' and time < '2024-01-03T14:46:35+

[PR] fix: struct field don't push down to TableScan [arrow-datafusion]

2024-01-06 Thread via GitHub
haohuaijin opened a new pull request, #8774: URL: https://github.com/apache/arrow-datafusion/pull/8774 ## Which issue does this PR close? Closes #8735 ## Rationale for this change when debug #8735, I find not only the case in #8735 don't push down, we have other

Re: [PR] feat: Add bloom filter metric to ParquetExec [arrow-datafusion]

2024-01-06 Thread via GitHub
simonvandel commented on code in PR #8772: URL: https://github.com/apache/arrow-datafusion/pull/8772#discussion_r1443688129 ## datafusion/core/src/datasource/physical_plan/parquet/metrics.rs: ## @@ -29,8 +29,10 @@ use crate::physical_plan::metrics::{ pub struct ParquetFileMetri

Re: [PR] Add ForeignKey constraint type [arrow-datafusion]

2024-01-06 Thread via GitHub
simonvandel commented on code in PR #8566: URL: https://github.com/apache/arrow-datafusion/pull/8566#discussion_r1443687586 ## datafusion/common/src/functional_dependencies.rs: ## @@ -97,8 +108,39 @@ impl Constraints { Constraint::Unique(indices)

Re: [PR] ci: speed up win64 test [arrow-datafusion]

2024-01-06 Thread via GitHub
Jefffrey commented on PR #8728: URL: https://github.com/apache/arrow-datafusion/pull/8728#issuecomment-1879626485 win64 test ran in 33min Still slightly slower than before (around 27min I believe) cc @comphead @alamb -- This is an automated message from the Apache Git Servi

Re: [PR] GH-39385: [C++] Use more permissable return code for rename [arrow]

2024-01-06 Thread via GitHub
kou commented on code in PR #39481: URL: https://github.com/apache/arrow/pull/39481#discussion_r1443667391 ## cpp/src/arrow/filesystem/localfs.cc: ## @@ -595,7 +595,7 @@ Status LocalFileSystem::Move(const std::string& src, const std::string& dest) {

Re: [PR] GH-39398: [C++][Parquet] DNM: benchmark for readLevels [arrow]

2024-01-06 Thread via GitHub
mapleFU commented on PR #39486: URL: https://github.com/apache/arrow/pull/39486#issuecomment-1879602168 It's just a quick poc for ReadLevels optimization... I think exporting it is so hacking, because `ReadLevels` is just "read levels in current page". So, `HasNext` is called, and I only ma

  1   2   >