Re: [PR] GH-38944: [Python] Fix spelling [arrow]

2023-12-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38945: URL: https://github.com/apache/arrow/pull/38945#issuecomment-1837075091 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6101d12676f4cfac52822e3dc13034306a4bd83b. There were 9

Re: [I] Use of undeclared crate or module `parquet` when compiling without `--feature=parquet` flag [arrow-datafusion]

2023-12-01 Thread via GitHub
jayzhan211 commented on issue #8250: URL: https://github.com/apache/arrow-datafusion/issues/8250#issuecomment-1837064992 > Hello, can I give it a try? Sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Unify type coercion and casting [arrow-datafusion]

2023-12-01 Thread via GitHub
jayzhan211 commented on issue #8302: URL: https://github.com/apache/arrow-datafusion/issues/8302#issuecomment-1837052849 After #8385, I found that it is not ideal to have one coercion for all the place (compare op, math op, signature coercion, etc... ). -- This is an automated message fr

[I] Miri CI check reports status for only arrow-ord [arrow-rs]

2023-12-01 Thread via GitHub
Jefffrey opened a new issue, #5159: URL: https://github.com/apache/arrow-rs/issues/5159 **Describe the bug** See here: https://github.com/apache/arrow-rs/blob/f621d28db590ff6ad3907450f7ff434c7deb9766/.github/workflows/miri.sh#L1-L18 Without `set -e` setting then bash

Re: [I] Get MIRI running against parquet crate [arrow-rs]

2023-12-01 Thread via GitHub
Jefffrey commented on issue #614: URL: https://github.com/apache/arrow-rs/issues/614#issuecomment-1837048410 Small update, I tried running MIRI: ```sh arrow-rs$ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test -p parquet ``` Got error: ``` test ar

Re: [PR] GH-38007: [C++][Python] Add VariableShapeTensor implementation [arrow]

2023-12-01 Thread via GitHub
rok commented on PR #38008: URL: https://github.com/apache/arrow/pull/38008#issuecomment-1837045096 Thank you for the thorough review @pitrou ! I addressed your comments (except for the `from_numpy` allowing for general strides which I'll do asap). Feel free to do another pass. --

Re: [PR] GH-38007: [C++][Python] Add VariableShapeTensor implementation [arrow]

2023-12-01 Thread via GitHub
rok commented on code in PR #38008: URL: https://github.com/apache/arrow/pull/38008#discussion_r1412725410 ## python/pyarrow/array.pxi: ## @@ -3586,6 +3586,156 @@ class FixedShapeTensorArray(ExtensionArray): ) +cdef class VariableShapeTensorArray(ExtensionArray): +

Re: [PR] GH-37484: [Python] Add a FixedSizeTensorScalar class [arrow]

2023-12-01 Thread via GitHub
rok commented on PR #37533: URL: https://github.com/apache/arrow/pull/37533#issuecomment-1837040423 > Didn't yet look in detail, but added some quick drive-by comments. And thanks for working on this! Thanks for the review @jorisvandenbossche ! I've addressed your points. > Can

Re: [PR] GH-37484: [Python] Add a FixedSizeTensorScalar class [arrow]

2023-12-01 Thread via GitHub
rok commented on code in PR #37533: URL: https://github.com/apache/arrow/pull/37533#discussion_r1412722784 ## python/pyarrow/array.pxi: ## @@ -3519,16 +3519,32 @@ class FixedShapeTensorArray(ExtensionArray): def to_numpy_ndarray(self): """ Convert fixed sh

Re: [PR] GH-37484: [Python] Add a FixedSizeTensorScalar class [arrow]

2023-12-01 Thread via GitHub
rok commented on code in PR #37533: URL: https://github.com/apache/arrow/pull/37533#discussion_r1412721102 ## cpp/src/arrow/extension/fixed_shape_tensor.cc: ## @@ -82,6 +82,45 @@ Status ComputeStrides(const FixedWidthType& type, const std::vector& sh } // namespace +const

Re: [PR] GH-37484: [Python] Add a FixedSizeTensorScalar class [arrow]

2023-12-01 Thread via GitHub
rok commented on code in PR #37533: URL: https://github.com/apache/arrow/pull/37533#discussion_r1412720994 ## cpp/src/arrow/extension/fixed_shape_tensor.cc: ## @@ -82,6 +82,45 @@ Status ComputeStrides(const FixedWidthType& type, const std::vector& sh } // namespace +const

Re: [PR] GH-38968: [C++] Fix spelling (dataset) [arrow]

2023-12-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38969: URL: https://github.com/apache/arrow/pull/38969#issuecomment-1837034446 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 1b9fe98be6338d4fae917d271c40261b25118f45. There were 8

[PR] add a summary table to benchmark compare output [arrow-datafusion]

2023-12-01 Thread via GitHub
razeghi71 opened a new pull request, #8399: URL: https://github.com/apache/arrow-datafusion/pull/8399 ## Which issue does this PR close? Closes #8390. ## Rationale for this change Explained in issue. ## What changes are included in this PR?

[PR] GH-38316: [C#] Implement interval types [arrow]

2023-12-01 Thread via GitHub
CurtHagenlocher opened a new pull request, #39043: URL: https://github.com/apache/arrow/pull/39043 ### What changes are included in this PR? Changes required to support the three interval types in the C# implementation. ### Are these changes tested? Partially. (Still nee

Re: [I] Use of undeclared crate or module `parquet` when compiling without `--feature=parquet` flag [arrow-datafusion]

2023-12-01 Thread via GitHub
Dennis40816 commented on issue #8250: URL: https://github.com/apache/arrow-datafusion/issues/8250#issuecomment-1837018207 Hello, can I give it a try? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Rename expr::window_function::WindowFunction to WindowFunctionDefinition for consistency [arrow-datafusion]

2023-12-01 Thread via GitHub
edmondop commented on code in PR #8382: URL: https://github.com/apache/arrow-datafusion/pull/8382#discussion_r1412700119 ## datafusion/expr/src/window_function.rs: ## @@ -1,470 +1,482 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributor licen

Re: [I] [R] Error on Table Merging in arrow for R [arrow]

2023-12-01 Thread via GitHub
amoeba commented on issue #39038: URL: https://github.com/apache/arrow/issues/39038#issuecomment-1836984815 I think a reprex is needed here. Even if you can't share your input files, finding a minimal sample of your `data.frame`s that reproduces the issue and sharing them some way (attachme

Re: [PR] GH-38996: [Java] Update dependencies and plugins for JPMS modules [arrow]

2023-12-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38994: URL: https://github.com/apache/arrow/pull/38994#issuecomment-1836976627 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 63bd0d5a85b180f76a185da9f1e3864cc50e0835. There were no

Re: [PR] Parquet: write column_orders in FileMetaData [arrow-rs]

2023-12-01 Thread via GitHub
Jefffrey commented on code in PR #5158: URL: https://github.com/apache/arrow-rs/pull/5158#discussion_r1412691279 ## parquet/src/file/writer.rs: ## @@ -323,14 +323,27 @@ impl SerializedFileWriter { None => Some(self.kv_metadatas.clone()), }; +// We

[PR] Parquet: write column_orders in FileMetaData [arrow-rs]

2023-12-01 Thread via GitHub
Jefffrey opened a new pull request, #5158: URL: https://github.com/apache/arrow-rs/pull/5158 # Which issue does this PR close? Closes #5152 # Rationale for this change # What changes are included in this PR? Populate `column_orders` in Parq

Re: [I] [R] Error on Table Merging in arrow for R [arrow]

2023-12-01 Thread via GitHub
TPDeramus commented on issue #39038: URL: https://github.com/apache/arrow/issues/39038#issuecomment-1836971381 Further, the use of `concat_tables()` library(arrow) library(tidyverse) library(fastDummies) temp <- open_csv_dataset(sources = cohort_csvs) %>% compute()

Re: [I] [R] Error on Table Merging in arrow for R [arrow]

2023-12-01 Thread via GitHub
TPDeramus commented on issue #39038: URL: https://github.com/apache/arrow/issues/39038#issuecomment-1836969418 Okay scratch that. The error will happen as soon as the second iteration. Probably just a typo from troubleshooting on my part. -- This is an automated message from the Ap

Re: [I] [Python] Can't install pyarrow on MacOS 14.0, Python 3.12 - ErrMsg: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects [arrow]

2023-12-01 Thread via GitHub
assignUser commented on issue #38311: URL: https://github.com/apache/arrow/issues/38311#issuecomment-1836952933 taipy pins pyarrow to <11: `"arrow": ["pyarrow>=10.0.1,<11.0"],` and for that no 3.12 wheels exist (as it was not out at the time) so your only choice is to downgrade to 3.11 for

Re: [PR] GH-35901: [C++][Python] pyarrow.csv.write_csv crashes when writing tables containing FixedSizeBinaryArray [arrow]

2023-12-01 Thread via GitHub
vibhatha commented on code in PR #36266: URL: https://github.com/apache/arrow/pull/36266#discussion_r1412669913 ## python/pyarrow/tests/test_csv.py: ## @@ -1972,6 +1972,33 @@ def test_write_csv_decimal(tmpdir, type_factory): assert out.column('col').cast(type) == table.colu

Re: [I] [R] Error on Table Merging in arrow for R [arrow]

2023-12-01 Thread via GitHub
TPDeramus commented on issue #39038: URL: https://github.com/apache/arrow/issues/39038#issuecomment-1836931108 Apologies but I am not well versed in the implementations of `browser()`. And it's doubly problematic because this is not always thrown as a typical error. Occasionall

Re: [PR] GH-35901: [C++][Python] pyarrow.csv.write_csv crashes when writing tables containing FixedSizeBinaryArray [arrow]

2023-12-01 Thread via GitHub
anjakefala commented on code in PR #36266: URL: https://github.com/apache/arrow/pull/36266#discussion_r1412668343 ## python/pyarrow/tests/test_csv.py: ## @@ -1972,6 +1972,33 @@ def test_write_csv_decimal(tmpdir, type_factory): assert out.column('col').cast(type) == table.co

Re: [PR] GH-35901: [C++][Python] pyarrow.csv.write_csv crashes when writing tables containing FixedSizeBinaryArray [arrow]

2023-12-01 Thread via GitHub
anjakefala commented on code in PR #36266: URL: https://github.com/apache/arrow/pull/36266#discussion_r1412662188 ## python/pyarrow/tests/test_csv.py: ## @@ -1972,6 +1972,33 @@ def test_write_csv_decimal(tmpdir, type_factory): assert out.column('col').cast(type) == table.co

Re: [I] Parquet: ColumnOrder not being written when writing parquet files [arrow-rs]

2023-12-01 Thread via GitHub
Jefffrey commented on issue #5152: URL: https://github.com/apache/arrow-rs/issues/5152#issuecomment-1836918970 Planning to take a shot at this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] GH-38950: [Docs] Fix spelling [arrow]

2023-12-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38951: URL: https://github.com/apache/arrow/pull/38951#issuecomment-1836915377 After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 353139680311e809d2413ea46e17e1656069ac5e. There were no

Re: [PR] GH-36441: [Python] Make `CacheOptions` configurable from Python [arrow]

2023-12-01 Thread via GitHub
Tom-Newton commented on code in PR #36627: URL: https://github.com/apache/arrow/pull/36627#discussion_r1412657259 ## python/pyarrow/_dataset.pyx: ## @@ -1963,6 +1963,100 @@ cdef class FragmentScanOptions(_Weakrefable): except TypeError: return False +cdef

Re: [PR] GH-36441: [Python] Make `CacheOptions` configurable from Python [arrow]

2023-12-01 Thread via GitHub
Tom-Newton commented on code in PR #36627: URL: https://github.com/apache/arrow/pull/36627#discussion_r1412657259 ## python/pyarrow/_dataset.pyx: ## @@ -1963,6 +1963,100 @@ cdef class FragmentScanOptions(_Weakrefable): except TypeError: return False +cdef

Re: [PR] GH-36441: [Python] Make `CacheOptions` configurable from Python [arrow]

2023-12-01 Thread via GitHub
Tom-Newton commented on PR #36627: URL: https://github.com/apache/arrow/pull/36627#issuecomment-1836908650 I think I've addressed all the specific comments. > We might want to move the python CacheOptions class from _dataset.pyx to eg io.pxi and expose it top-level in pyarrow (instead of

Re: [I] [Python] Can't install pyarrow on MacOS 14.0, Python 3.12 - ErrMsg: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects [arrow]

2023-12-01 Thread via GitHub
joaomcarlos commented on issue #38311: URL: https://github.com/apache/arrow/issues/38311#issuecomment-1836894385 Hi, I am trying to `pip install taipy`, it fails on trying to build the wheel for pyarrow. ``` Using cached zipp-3.17.0-py3-none-any.whl (7.4 kB) Building wheels

Re: [PR] GH-36441: [Python] Make `CacheOptions` configurable from Python [arrow]

2023-12-01 Thread via GitHub
Tom-Newton commented on code in PR #36627: URL: https://github.com/apache/arrow/pull/36627#discussion_r1412637417 ## python/pyarrow/_dataset.pyx: ## @@ -1963,6 +1963,100 @@ cdef class FragmentScanOptions(_Weakrefable): except TypeError: return False +cdef

Re: [PR] GH-36441: [Python] Make `CacheOptions` configurable from Python [arrow]

2023-12-01 Thread via GitHub
Tom-Newton commented on code in PR #36627: URL: https://github.com/apache/arrow/pull/36627#discussion_r1412636547 ## python/pyarrow/_dataset_parquet.pyx: ## @@ -769,6 +777,14 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions): def pre_buffer(self, bint pre_buffer

Re: [I] `array_ndims` doesn't correctly handle List(Null) [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb closed issue #8318: `array_ndims` doesn't correctly handle List(Null) URL: https://github.com/apache/arrow-datafusion/issues/8318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Materialize Dictionaries in Group Keys [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb closed issue #7647: Materialize Dictionaries in Group Keys URL: https://github.com/apache/arrow-datafusion/issues/7647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Materialize dictionaries in group keys [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb merged PR #8291: URL: https://github.com/apache/arrow-datafusion/pull/8291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Rewrite `array_ndims` to fix List(Null) handling [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb merged PR #8320: URL: https://github.com/apache/arrow-datafusion/pull/8320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] enable users to add support for object_store table formats of different types [arrow-datafusion]

2023-12-01 Thread via GitHub
tychoish commented on issue #8345: URL: https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1836858807 I guess my first question is "are `FileType` and `FileFormat` actually meaningfully distinct?" Do we imagine that there are cases where you'd want to have different `FileT

Re: [I] windows function don't support `partition by null` and `order by null` and cause error [arrow-datafusion]

2023-12-01 Thread via GitHub
comphead commented on issue #8386: URL: https://github.com/apache/arrow-datafusion/issues/8386#issuecomment-1836855380 its not only null problem, DF cannot handle scalars as well ``` ❯ select a, rank() over (partition by 1 order by 1) from (select 1 a union all select 2 a); Exe

Re: [PR] Support named query parameters [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on PR #8384: URL: https://github.com/apache/arrow-datafusion/pull/8384#issuecomment-1836856062 Thanks @Asura7969 -- I hope to review this later today or tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] Parquet Statistics Pruning Ignores ColumnOrder, resulting in potentially incorrect statistics [arrow-datafusion]

2023-12-01 Thread via GitHub
tustvold commented on issue #8342: URL: https://github.com/apache/arrow-datafusion/issues/8342#issuecomment-1836849008 Afraid this is tracking something different that PR didn't address, as we aren't even populating this correctly in parquet-rs currently -- This is an automated message f

Re: [PR] GH-39003: [CI][macOS] Don't update Homebrew [arrow]

2023-12-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39016: URL: https://github.com/apache/arrow/pull/39016#issuecomment-1836844355 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 985e86a41fe0937b2d8d39522066ba67c0aa3fa4. There were no

Re: [PR] GH-38597: [C++] Implement GetFileInfo(selector) for Azure filesystem [arrow]

2023-12-01 Thread via GitHub
kou commented on code in PR #39009: URL: https://github.com/apache/arrow/pull/39009#discussion_r1412585789 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -815,6 +815,233 @@ class AzureFileSystem::Impl { } } + private: + template + Status ListContainers(const Azure::C

[PR] Adding `is_null` datatype shortcut method [arrow-rs]

2023-12-01 Thread via GitHub
comphead opened a new pull request, #5157: URL: https://github.com/apache/arrow-rs/pull/5157 # Which issue does this PR close? Closes #. # Rationale for this change Adding `is_null` datatype shortcut method to already existing collection of `is_*` methods

Re: [PR] GH-38984: [Python][Packaging] Verification of wheels on AlmaLinux 8 are failing due to missing pip [arrow]

2023-12-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38985: URL: https://github.com/apache/arrow/pull/38985#issuecomment-1836842167 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 2760faf461acfa4f2e32adf660448fa3e0b02018. There were no

Re: [PR] GH-39017: [JS] Add `typeId` as attribute [arrow]

2023-12-01 Thread via GitHub
domoritz commented on PR #39018: URL: https://github.com/apache/arrow/pull/39018#issuecomment-1836839954 I'm good with this change but want to hear @trxcllnt's opinion before we merge. IIRC, the reason why types are not singletons is that some types need details in the constructor.

Re: [PR] fix(go/adbc/driver/snowflake): Made GetObjects case insensitive [arrow-adbc]

2023-12-01 Thread via GitHub
ryan-syed commented on PR #1328: URL: https://github.com/apache/arrow-adbc/pull/1328#issuecomment-1836826401 Refactoring includes being able to match the exact catalog, schema, and table names when wild cards are involved in the name indentifiers of the `GetObjects` patterns For exam

Re: [I] [R] Error on Table Merging in arrow for R [arrow]

2023-12-01 Thread via GitHub
amoeba commented on issue #39038: URL: https://github.com/apache/arrow/issues/39038#issuecomment-1836820307 Hi @TPDeramus as was asked in your [StackOverflow post](https://stackoverflow.com/questions/77586964/error-when-attempting-to-full-join-of-two-arrow-tables-in-r), a way for us to repr

Re: [I] Consider introducing unique expression IDs in Logical/Physical plan [arrow-datafusion]

2023-12-01 Thread via GitHub
Jefffrey commented on issue #8379: URL: https://github.com/apache/arrow-datafusion/issues/8379#issuecomment-1836817802 Actually I think I was off the mark on what `ExprId` is intended to do, it seems it would be more useful if there were a new LogicalExpr enum such as `AttributeReference`,

Re: [PR] fix(go/adbc/driver/snowflake): Made GetObjects case insensitive [arrow-adbc]

2023-12-01 Thread via GitHub
ryan-syed commented on code in PR #1328: URL: https://github.com/apache/arrow-adbc/pull/1328#discussion_r1412591687 ## go/adbc/driver/snowflake/connection.go: ## @@ -282,10 +282,10 @@ func (c *cnxn) getObjectsDbSchemas(ctx context.Context, depth adbc.ObjectDepth, cond

Re: [I] use pyarrow.parquet.read_schema on parquet file in cloud storage [arrow]

2023-12-01 Thread via GitHub
wirable23 commented on issue #39039: URL: https://github.com/apache/arrow/issues/39039#issuecomment-1836770619 This is more of a private cloud storage, is there some interface I can implement in python (similar to RandomAccessFile in c++) to provide my own abstraction of reading data from c

Re: [I] enable users to add support for object_store table formats of different types [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8345: URL: https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1836766922 Thank you for bringing this up @tychoish -- I agree the current state of FileType/FileFormat is not ideal as it means, as you have pointed out, that it is not possible to us

Re: [PR] Make filter selectivity for statistics configurable [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on code in PR #8243: URL: https://github.com/apache/arrow-datafusion/pull/8243#discussion_r1412563817 ## datafusion/physical-plan/src/filter.rs: ## @@ -994,4 +1014,22 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn test_validation_filt

Re: [I] Range/inequality joins are slow [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8393: URL: https://github.com/apache/arrow-datafusion/issues/8393#issuecomment-1836758860 I stared trying to collect a list of various join improvments on https://github.com/apache/arrow-datafusion/issues/8398 -- This is an automated message from the Apache Git

Re: [PR] MINOR: [JS] Bump eslint-plugin-unicorn from 47.0.0 to 49.0.0 in /js [arrow]

2023-12-01 Thread via GitHub
kou merged PR #39035: URL: https://github.com/apache/arrow/pull/39035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-39020: [CI][Release][JS] Use Node.js 18 instead of 16 [arrow]

2023-12-01 Thread via GitHub
kou merged PR #39021: URL: https://github.com/apache/arrow/pull/39021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-38701: [C++][FS][Azure] Implement `DeleteDirContents()` [arrow]

2023-12-01 Thread via GitHub
kou merged PR #3: URL: https://github.com/apache/arrow/pull/3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[I] [Epic] A collection of Join Improvements [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb opened a new issue, #8398: URL: https://github.com/apache/arrow-datafusion/issues/8398 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [I] ASOF join support / Specialize Range Joins [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #318: URL: https://github.com/apache/arrow-datafusion/issues/318#issuecomment-1836750650 There is a blog post about this from duckdbL https://duckdb.org/2022/05/27/iejoin.html -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Range/inequality joins are slow [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8393: URL: https://github.com/apache/arrow-datafusion/issues/8393#issuecomment-1836749967 I think `IEJoin` is a form of RangeJoin (https://duckdb.org/2022/05/27/iejoin.html) -- I agree it would be neat to make this fast in DataFusion, but I think it is a pretty ma

Re: [PR] Fix PartialOrd for ScalarValue::List/FixSizeList/LargeList [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on code in PR #8253: URL: https://github.com/apache/arrow-datafusion/pull/8253#discussion_r1412553900 ## datafusion/common/src/scalar.rs: ## @@ -3458,6 +3436,7 @@ impl ScalarType for TimestampNanosecondType { } #[cfg(test)] +#[cfg(feature = "parquet")] Revie

Re: [I] Consider introducing unique expression IDs in Logical/Physical plan [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8379: URL: https://github.com/apache/arrow-datafusion/issues/8379#issuecomment-1836743922 I wonder what "unique" means ? Like every newly created `Expr` gets some sort of id? Some examples: ```rust // would expr1 and expr2 have the same id? let

Re: [PR] Rewrite `array_ndims` to fix List(Null) handling [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on PR #8320: URL: https://github.com/apache/arrow-datafusion/pull/8320#issuecomment-1836741797 I merged this branch into `main` as it was fairly old and I want to make sure there are no logical conflicts -- This is an automated message from the Apache Git Service. To resp

Re: [I] windows function don't support `partition by null` and `order by null` and cause error [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8386: URL: https://github.com/apache/arrow-datafusion/issues/8386#issuecomment-1836737533 This is not yet fixed (I had hoped it would be in https://github.com/apache/arrow-datafusion/pull/8371) ``` ❯ CREATE TABLE t1 (a int) AS VALUES (1), (2), (3

Re: [I] Parquet Statistics Pruning Ignores ColumnOrder, resulting in potentially incorrect statistics [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb closed issue #8342: Parquet Statistics Pruning Ignores ColumnOrder, resulting in potentially incorrect statistics URL: https://github.com/apache/arrow-datafusion/issues/8342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Parquet Statistics Pruning Ignores ColumnOrder, resulting in potentially incorrect statistics [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8342: URL: https://github.com/apache/arrow-datafusion/issues/8342#issuecomment-1836724417 I believe we fixed this in https://github.com/apache/arrow-datafusion/pull/8294 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] SQL: Ambiguous reference when aliasing in combination with ORDER BY [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8391: URL: https://github.com/apache/arrow-datafusion/issues/8391#issuecomment-1836721526 It might be good to verify the expected results in spark / postgres before making a fix -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Minor: Refactor function argument handling in `ScalarFunctionDefinition` [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb merged PR #8387: URL: https://github.com/apache/arrow-datafusion/pull/8387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Rename expr::window_function::WindowFunction to WindowFunctionDefinition for consistency [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on code in PR #8382: URL: https://github.com/apache/arrow-datafusion/pull/8382#discussion_r1412534980 ## datafusion/expr/src/expr.rs: ## @@ -393,6 +398,258 @@ impl ScalarFunction { } } +/// WindowFunction +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +pub

Re: [PR] Minor: Refactor array_union function to use a generic union_arrays function [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on code in PR #8381: URL: https://github.com/apache/arrow-datafusion/pull/8381#discussion_r1412530157 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1515,32 +1515,33 @@ pub fn array_union(args: &[ArrayRef]) -> Result { } let array1 = &arg

Re: [PR] Minor: Refactor array_union function to use a generic union_arrays function [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb merged PR #8381: URL: https://github.com/apache/arrow-datafusion/pull/8381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] Transforming LogicalPlan::Explain using Treenode::transform fails unexpectedly [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8396: URL: https://github.com/apache/arrow-datafusion/issues/8396#issuecomment-1836708129 Thank you for the report -- I agree explain plan may need special handling -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Make filter selectivity for statistics configurable [arrow-datafusion]

2023-12-01 Thread via GitHub
edmondop commented on code in PR #8243: URL: https://github.com/apache/arrow-datafusion/pull/8243#discussion_r1412526748 ## datafusion/physical-plan/src/filter.rs: ## @@ -994,4 +1014,22 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn test_validation_f

Re: [PR] Allow default value on lag/lead if they are coerceable to value [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on code in PR #8308: URL: https://github.com/apache/arrow-datafusion/pull/8308#discussion_r1412522516 ## datafusion/physical-expr/src/window/lead_lag.rs: ## @@ -238,9 +238,8 @@ fn get_default_value( ) -> Result { if let Some(default_value) = default_value {

Re: [PR] Allow default value on lag/lead if they are coerceable to value [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on PR #8308: URL: https://github.com/apache/arrow-datafusion/pull/8308#issuecomment-1836701890 (p.s sorry for the late review) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: add projection to FilterExec [arrow-datafusion]

2023-12-01 Thread via GitHub
Dandandan commented on PR #7932: URL: https://github.com/apache/arrow-datafusion/pull/7932#issuecomment-1836695287 @junjunjd if you are able to work on this, it would be good to fix the remaining tests (either test need to be changed or expected output needs to be changed) and see why we h

Re: [I] Miscellaneous ntile function bugs, possible incorrect results [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb closed issue #8284: Miscellaneous ntile function bugs, possible incorrect results URL: https://github.com/apache/arrow-datafusion/issues/8284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] fix: make `ntile` work in some corner cases [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb merged PR #8371: URL: https://github.com/apache/arrow-datafusion/pull/8371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Document timestamp input limits [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb merged PR #8369: URL: https://github.com/apache/arrow-datafusion/pull/8369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] Timestamp overflows for extreme low/high values [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb closed issue #8336: Timestamp overflows for extreme low/high values URL: https://github.com/apache/arrow-datafusion/issues/8336 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Inconsistent Signedness Of Legacy Parquet Timestamps Written By Spark [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb closed issue #7958: Inconsistent Signedness Of Legacy Parquet Timestamps Written By Spark URL: https://github.com/apache/arrow-datafusion/issues/7958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Document timestamp input limits [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on PR #8369: URL: https://github.com/apache/arrow-datafusion/pull/8369#issuecomment-1836693090 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] MINOR: [JS] Bump @swc/helpers from 0.5.2 to 0.5.3 in /js [arrow]

2023-12-01 Thread via GitHub
kou merged PR #39036: URL: https://github.com/apache/arrow/pull/39036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] MINOR: [JS] Bump typescript from 5.1.3 to 5.1.6 in /js [arrow]

2023-12-01 Thread via GitHub
kou merged PR #39034: URL: https://github.com/apache/arrow/pull/39034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] MINOR: [JS] Bump webpack-bundle-analyzer from 4.9.1 to 4.10.1 in /js [arrow]

2023-12-01 Thread via GitHub
kou merged PR #39033: URL: https://github.com/apache/arrow/pull/39033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] Make filter selectivity for statistics configurable [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on code in PR #8243: URL: https://github.com/apache/arrow-datafusion/pull/8243#discussion_r1412505974 ## datafusion/physical-plan/src/filter.rs: ## @@ -994,4 +1014,22 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn test_validation_filt

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Nov 27, 2023 [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb commented on issue #8329: URL: https://github.com/apache/arrow-datafusion/issues/8329#issuecomment-1836668998 DataFusion - [ ] https://github.com/apache/arrow-datafusion/pull/8331 - [ ] https://github.com/apache/arrow-datafusion/pull/8356 DataFusion: As time permits (l

[PR] Removing ahash [arrow-rs]

2023-12-01 Thread via GitHub
psvri opened a new pull request, #5156: URL: https://github.com/apache/arrow-rs/pull/5156 # Which issue does this PR close? Closes #. # Rationale for this change Ahash is not being used anywhere in crate arrow. Hence I am removing it. # What changes are in

[PR] GH-39041:[R] Improve `update-checksum.R` output [arrow]

2023-12-01 Thread via GitHub
assignUser opened a new pull request, #39042: URL: https://github.com/apache/arrow/pull/39042 ### Rationale for this change The script was to quiet. ### What changes are included in this PR? Fix regex and add some output: ``` Rscript tools/update-checksums.R 14

Re: [PR] GH-37484: [Python] Add a FixedSizeTensorScalar class [arrow]

2023-12-01 Thread via GitHub
jorisvandenbossche commented on code in PR #37533: URL: https://github.com/apache/arrow/pull/37533#discussion_r1412460696 ## cpp/src/arrow/extension/fixed_shape_tensor.cc: ## @@ -82,6 +82,45 @@ Status ComputeStrides(const FixedWidthType& type, const std::vector& sh } // nam

Re: [PR] GH-38779: [R][CI] Use devtools on self-hosted machines and use macos-11 for intel package build [arrow]

2023-12-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38974: URL: https://github.com/apache/arrow/pull/38974#issuecomment-1836631952 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 810aa4b6055a460129bd4141f522c6a755389666. There were no

Re: [PR] GH-38857: [Python] Add append mode for pyarrow.OsFile [arrow]

2023-12-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38820: URL: https://github.com/apache/arrow/pull/38820#issuecomment-1836630028 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 92fe831c900bdffcd5517cf6af61aabb962bef3f. There were no

Re: [PR] Materialize dictionaries in group keys [arrow-datafusion]

2023-12-01 Thread via GitHub
qrilka commented on PR #8291: URL: https://github.com/apache/arrow-datafusion/pull/8291#issuecomment-1836629630 @alamb I was proposing a new test (which is what is in my quote) now I added it into the PR. Please take a look -- This is an automated message from the Apache Git Service. To

Re: [PR] GH-39037: [Java] Remove (Contrib/Experimental) mention in Flight SQL [arrow]

2023-12-01 Thread via GitHub
laurentgo commented on PR #39040: URL: https://github.com/apache/arrow/pull/39040#issuecomment-1836630148 Do we need to go into some formal discussion on the mailiing list first? or is it okay to change the format mention as well? -- This is an automated message from the Apache Git Servic

Re: [PR] GH-35901: [C++][Python] pyarrow.csv.write_csv crashes when writing tables containing FixedSizeBinaryArray [arrow]

2023-12-01 Thread via GitHub
anjakefala commented on code in PR #36266: URL: https://github.com/apache/arrow/pull/36266#discussion_r1412468131 ## cpp/src/arrow/compute/kernels/scalar_cast_string.cc: ## @@ -338,10 +338,12 @@ BinaryToBinaryCastExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* ou

Re: [PR] GH-35901: [C++][Python] pyarrow.csv.write_csv crashes when writing tables containing FixedSizeBinaryArray [arrow]

2023-12-01 Thread via GitHub
anjakefala commented on code in PR #36266: URL: https://github.com/apache/arrow/pull/36266#discussion_r1412464754 ## cpp/src/arrow/compute/kernels/scalar_cast_test.cc: ## @@ -2104,6 +2104,11 @@ TEST(Cast, BinaryToString) { // ARROW-16757: we no longer zero copy, but the con

[PR] POC Make BloomFilter application general, add `PruningPredicate::contains` [arrow-datafusion]

2023-12-01 Thread via GitHub
alamb opened a new pull request, #8397: URL: https://github.com/apache/arrow-datafusion/pull/8397 ## Which issue does this PR close? POC for https://github.com/apache/arrow-datafusion/issues/8376 ## Rationale for this change See https://github.com/apache/arrow-datafusion/

Re: [PR] Make filter selectivity for statistics configurable [arrow-datafusion]

2023-12-01 Thread via GitHub
Dandandan commented on PR #8243: URL: https://github.com/apache/arrow-datafusion/pull/8243#issuecomment-1836602774 FYI @alamb @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

  1   2   3   >