[GitHub] [arrow-rs] tustvold commented on pull request #3139: Cast: should get the round result for decimal to a decimal with smaller scale

2022-11-26 Thread GitBox
tustvold commented on PR #3139: URL: https://github.com/apache/arrow-rs/pull/3139#issuecomment-1328191081 Apologies I misread your example, if the integer value was `1230` casting would yield an integer value of `123`, with the same string value. Casting an integer value of `123` with a cor

[GitHub] [arrow-rs] tustvold commented on issue #2986: Decimal Casts are Unchecked

2022-11-26 Thread GitBox
tustvold commented on issue #2986: URL: https://github.com/apache/arrow-rs/issues/2986#issuecomment-1328190875 Not sure if this should be reopened or a new issue created, but https://github.com/apache/arrow-rs/pull/3203 would suggest that truncation is not properly checked yet -- This is

[GitHub] [arrow] rtpsw commented on a diff in pull request #14485: ARROW-17980: [C++] As-of-Join Substrait extension

2022-11-26 Thread GitBox
rtpsw commented on code in PR #14485: URL: https://github.com/apache/arrow/pull/14485#discussion_r1032883655 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -657,6 +657,13 @@ else() "${THIRDPARTY_MIRROR_URL}/snappy-${ARROW_SNAPPY_BUILD_VERSION}.tar.gz") endif

[GitHub] [arrow] github-actions[bot] commented on pull request #14742: ARROW-10158: [C++][Parquet] Expose page index info from ColumnChunkMetaData

2022-11-26 Thread GitBox
github-actions[bot] commented on PR #14742: URL: https://github.com/apache/arrow/pull/14742#issuecomment-1328190383 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #14742: ARROW-10158: [C++][Parquet] Expose page index info from ColumnChunkMetaData

2022-11-26 Thread GitBox
github-actions[bot] commented on PR #14742: URL: https://github.com/apache/arrow/pull/14742#issuecomment-1328190381 https://issues.apache.org/jira/browse/ARROW-10158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] wgtmac opened a new pull request, #14742: ARROW-10158: [C++][Parquet] Expose page index info from ColumnChunkMetaData

2022-11-26 Thread GitBox
wgtmac opened a new pull request, #14742: URL: https://github.com/apache/arrow/pull/14742 This is the first step to support page index of parquet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] westonpace merged pull request #14735: ARROW-18406: [C++] Can't build Arrow with Substrait on Ubuntu 20.04

2022-11-26 Thread GitBox
westonpace merged PR #14735: URL: https://github.com/apache/arrow/pull/14735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4377: Refactor the Hash Join

2022-11-26 Thread GitBox
liukun4515 commented on code in PR #4377: URL: https://github.com/apache/arrow-datafusion/pull/4377#discussion_r1032860953 ## datafusion/core/tests/sql/joins.rs: ## @@ -2211,8 +2211,6 @@ async fn null_aware_left_anti_join() -> Result<()> { } #[tokio::test] -#[ignore = "Test

[GitHub] [arrow-rs] avantgardnerio commented on issue #3142: AppendableRecordBatch

2022-11-26 Thread GitBox
avantgardnerio commented on issue #3142: URL: https://github.com/apache/arrow-rs/issues/3142#issuecomment-1328161372 Others might still find it useful, but I've convinced myself with how we are planning to handle MVCC we'll need to make copies anyway, so the builder approach is probably fin

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
jackwener commented on code in PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#discussion_r1032798149 ## datafusion/core/tests/sql/joins.rs: ## @@ -1635,16 +1635,16 @@ async fn reduce_left_join_3() -> Result<()> { let expected = vec![ "E

[GitHub] [arrow-datafusion] liukun4515 commented on issue #4363: Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions

2022-11-26 Thread GitBox
liukun4515 commented on issue #4363: URL: https://github.com/apache/arrow-datafusion/issues/4363#issuecomment-1328158438 > I think it is also possible to extend the HashJoin implementation with the new semantics rather than adding an entirely new physical join implementation hash joi

[GitHub] [arrow-datafusion] liukun4515 closed issue #3442: support more data type in prune for cast/try_cast

2022-11-26 Thread GitBox
liukun4515 closed issue #3442: support more data type in prune for cast/try_cast URL: https://github.com/apache/arrow-datafusion/issues/3442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow-datafusion] liukun4515 commented on pull request #4353: Support type coercion for join on columns

2022-11-26 Thread GitBox
liukun4515 commented on PR #4353: URL: https://github.com/apache/arrow-datafusion/pull/4353#issuecomment-1328154903 > @alamb @mingmwang @liukun4515 PTAL. Thanks @ygf11 I will try to find time to review it for the next few days. -- This is an automated message from the Apache Git Se

[GitHub] [arrow-rs] viirya opened a new pull request, #3203: Add a cast test case for decimal negative scale

2022-11-26 Thread GitBox
viirya opened a new pull request, #3203: URL: https://github.com/apache/arrow-rs/pull/3203 # Which issue does this PR close? Closes #. # Rationale for this change Clarifying the question at https://github.com/apache/arrow-rs/pull/3139#issuecomment-132794

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4377: Refactor the Hash Join

2022-11-26 Thread GitBox
liukun4515 commented on code in PR #4377: URL: https://github.com/apache/arrow-datafusion/pull/4377#discussion_r1032854691 ## datafusion/core/src/physical_plan/joins/hash_join.rs: ## @@ -1482,105 +1325,115 @@ impl HashJoinStream { let visited_left_side = self.visited_

[GitHub] [arrow-rs] liukun4515 commented on pull request #3139: Cast: should get the round result for decimal to a decimal with smaller scale

2022-11-26 Thread GitBox
liukun4515 commented on PR #3139: URL: https://github.com/apache/arrow-rs/pull/3139#issuecomment-1328150981 > 123 I am confused about this, if the data type is decimal(10,-2) and the 128-bit integer is `123`, it represent the value of `12300`, and the value has been changed after cas

[GitHub] [arrow-datafusion] comphead commented on pull request #4385: `date_part` support fractions of second

2022-11-26 Thread GitBox
comphead commented on PR #4385: URL: https://github.com/apache/arrow-datafusion/pull/4385#issuecomment-1328149457 @waitingkuo finally we can get back to #3997 resolution. Please check the PR. Now date_part supports second fraction, but its still not the same as in PSQL like ``

[GitHub] [arrow-datafusion] comphead opened a new pull request, #4385: `date_part` support fractions of second

2022-11-26 Thread GitBox
comphead opened a new pull request, #4385: URL: https://github.com/apache/arrow-datafusion/pull/4385 # Which issue does this PR close? Closes #3997 # Rationale for this change See #3997 # What changes are included in this PR? Extending `date_part` to suppor

[GitHub] [arrow-rs] viirya commented on pull request #3198: Use SlicesIterator for ArrayData Equality

2022-11-26 Thread GitBox
viirya commented on PR #3198: URL: https://github.com/apache/arrow-rs/pull/3198#issuecomment-1328138638 > Perhaps this could be simplified by using try_for_each_valid_idx? Tried with `try_for_each_valid_idx`, but the benchmarks look not good: ``` equal_nulls_512 time:

[GitHub] [arrow] ursabot commented on pull request #14671: ARROW-18361: [CI][Conan] Merge upstream changes

2022-11-26 Thread GitBox
ursabot commented on PR #14671: URL: https://github.com/apache/arrow/pull/14671#issuecomment-1328128102 Benchmark runs are scheduled for baseline = 8a9374134926d60483cdbf1c3060a88e5a3a5adc and contender = c0b311ee83c6ef8d9cf43d9e67af1f5d61dbfdd5. c0b311ee83c6ef8d9cf43d9e67af1f5d61dbfdd5 is

[GitHub] [arrow] ursabot commented on pull request #14715: ARROW-18390: [CI][Python] Update spark test modules to match spark master

2022-11-26 Thread GitBox
ursabot commented on PR #14715: URL: https://github.com/apache/arrow/pull/14715#issuecomment-1328110941 Benchmark runs are scheduled for baseline = 2078af7c710d688c14313b9486b99c981550a7b7 and contender = 8a9374134926d60483cdbf1c3060a88e5a3a5adc. 8a9374134926d60483cdbf1c3060a88e5a3a5adc is

[GitHub] [arrow-rs] ursabot commented on pull request #3200: Deprecate limit kernel

2022-11-26 Thread GitBox
ursabot commented on PR #3200: URL: https://github.com/apache/arrow-rs/pull/3200#issuecomment-1328110932 Benchmark runs are scheduled for baseline = 2ea47e436d59a576d58d895d5805de1f2fe4c399 and contender = 0ef18481bd44a08fe041aa23c7b97b0c4695a024. 0ef18481bd44a08fe041aa23c7b97b0c4695a024 i

[GitHub] [arrow-rs] ursabot commented on pull request #3201: Move zip and shift kernels to arrow-select

2022-11-26 Thread GitBox
ursabot commented on PR #3201: URL: https://github.com/apache/arrow-rs/pull/3201#issuecomment-1328110930 Benchmark runs are scheduled for baseline = 0b12828ddc75112c92541c612e7a75e5dbe44081 and contender = 2ea47e436d59a576d58d895d5805de1f2fe4c399. 2ea47e436d59a576d58d895d5805de1f2fe4c399 i

[GitHub] [arrow-rs] tustvold merged pull request #3200: Deprecate limit kernel

2022-11-26 Thread GitBox
tustvold merged PR #3200: URL: https://github.com/apache/arrow-rs/pull/3200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold merged pull request #3201: Move zip and shift kernels to arrow-select

2022-11-26 Thread GitBox
tustvold merged PR #3201: URL: https://github.com/apache/arrow-rs/pull/3201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-datafusion] ursabot commented on pull request #4345: minor: improve optimizer logging and do not repeat rule name

2022-11-26 Thread GitBox
ursabot commented on PR #4345: URL: https://github.com/apache/arrow-datafusion/pull/4345#issuecomment-1328106812 Benchmark runs are scheduled for baseline = 27ae14aeec840f404d5ebe44e341f5dbea4c6f63 and contender = 323fbb43a3359d59bd12daf6d2b38102b4671d6f. 323fbb43a3359d59bd12daf6d2b38102b

[GitHub] [arrow-datafusion] alamb merged pull request #4345: minor: improve optimizer logging and do not repeat rule name

2022-11-26 Thread GitBox
alamb merged PR #4345: URL: https://github.com/apache/arrow-datafusion/pull/4345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-rs] tustvold commented on issue #3199: Add a way to compare datatype for "semantic compatibility / equality"

2022-11-26 Thread GitBox
tustvold commented on issue #3199: URL: https://github.com/apache/arrow-rs/issues/3199#issuecomment-1328102949 Perhaps we could modify https://docs.rs/arrow-schema/27.0.0/arrow_schema/enum.DataType.html#method.equals_datatype to not take account of nullability? -- This is an automated me

[GitHub] [arrow-rs] viirya opened a new pull request, #3202: Hide _dict_scalar kernels behind _dyn kernels

2022-11-26 Thread GitBox
viirya opened a new pull request, #3202: URL: https://github.com/apache/arrow-rs/pull/3202 # Which issue does this PR close? Part of https://github.com/apache/arrow-rs/issues/1975. # Rationale for this change https://github.com/apache/arrow-rs/pull/3197#i

[GitHub] [arrow-rs] tustvold commented on pull request #3197: Add dictionary suppport to like, ilike, nlike, nilike kernels

2022-11-26 Thread GitBox
tustvold commented on PR #3197: URL: https://github.com/apache/arrow-rs/pull/3197#issuecomment-1328099017 That sounds good to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow-rs] viirya commented on pull request #3197: Add dictionary suppport to like, ilike, nlike, nilike kernels

2022-11-26 Thread GitBox
viirya commented on PR #3197: URL: https://github.com/apache/arrow-rs/pull/3197#issuecomment-1328098768 > We don't seem to provide `eq_dict` or `gt_dict` instead this is hidden within `eq_dyn`, perhaps we should do the same here? I think it is because there are `like_dict_scalar`, `nl

[GitHub] [arrow-datafusion-python] isidentical commented on a diff in pull request #83: Update release instructions

2022-11-26 Thread GitBox
isidentical commented on code in PR #83: URL: https://github.com/apache/arrow-datafusion-python/pull/83#discussion_r1032822354 ## dev/release/README.md: ## @@ -19,18 +19,121 @@ # DataFusion Python Release Process -This is a work-in-progress that will be updated as we work

[GitHub] [arrow-datafusion-python] isidentical commented on a diff in pull request #83: Update release instructions

2022-11-26 Thread GitBox
isidentical commented on code in PR #83: URL: https://github.com/apache/arrow-datafusion-python/pull/83#discussion_r1032821785 ## dev/release/README.md: ## @@ -19,18 +19,121 @@ # DataFusion Python Release Process -This is a work-in-progress that will be updated as we work

[GitHub] [arrow-datafusion] andygrove closed pull request #3696: Stop skipping failing optimizer rules in tests

2022-11-26 Thread GitBox
andygrove closed pull request #3696: Stop skipping failing optimizer rules in tests URL: https://github.com/apache/arrow-datafusion/pull/3696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] ursabot commented on pull request #14225: ARROW-17836: [C++] Allow specifying alignment of buffers

2022-11-26 Thread GitBox
ursabot commented on PR #14225: URL: https://github.com/apache/arrow/pull/14225#issuecomment-1328091019 Benchmark runs are scheduled for baseline = 63f013cdb36d05f6f96a145aff3c6232470f2d02 and contender = 2078af7c710d688c14313b9486b99c981550a7b7. 2078af7c710d688c14313b9486b99c981550a7b7 is

[GitHub] [arrow-datafusion-python] isidentical commented on issue #82: Cannot install on Mac M1 from source tarball from testpypi

2022-11-26 Thread GitBox
isidentical commented on issue #82: URL: https://github.com/apache/arrow-datafusion-python/issues/82#issuecomment-1328088989 Pretty cool to hear, I'll try to verify the last release artifacts and go through the release notes. Thanks so much for pushing this release, I think it is going to

[GitHub] [arrow-datafusion-python] andygrove closed issue #82: Cannot install on Mac M1 from source tarball from testpypi

2022-11-26 Thread GitBox
andygrove closed issue #82: Cannot install on Mac M1 from source tarball from testpypi URL: https://github.com/apache/arrow-datafusion-python/issues/82 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow-datafusion-python] andygrove commented on issue #82: Cannot install on Mac M1 from source tarball from testpypi

2022-11-26 Thread GitBox
andygrove commented on issue #82: URL: https://github.com/apache/arrow-datafusion-python/issues/82#issuecomment-1328088816 Thanks @isidentical ... that fixed it, and I have included this in the instructions in https://github.com/apache/arrow-datafusion-python/pull/83 -- This is an automa

[GitHub] [arrow-datafusion-python] andygrove commented on pull request #83: Update release instructions

2022-11-26 Thread GitBox
andygrove commented on PR #83: URL: https://github.com/apache/arrow-datafusion-python/pull/83#issuecomment-1328088674 @isidentical @francis-du fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-datafusion-python] andygrove opened a new pull request, #83: Update release instructions

2022-11-26 Thread GitBox
andygrove opened a new pull request, #83: URL: https://github.com/apache/arrow-datafusion-python/pull/83 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing cha

[GitHub] [arrow-datafusion-python] isidentical commented on issue #82: Cannot install on Mac M1 from source tarball from testpypi

2022-11-26 Thread GitBox
isidentical commented on issue #82: URL: https://github.com/apache/arrow-datafusion-python/issues/82#issuecomment-1328086031 @andygrove just as a note (I didn't test it yet), but it might be worth to give a shot at `--extra-index-url` flag which allows to install stuff from both regular Py

[GitHub] [arrow-datafusion-python] isidentical commented on issue #82: Cannot install on Mac M1 from source tarball from testpypi

2022-11-26 Thread GitBox
isidentical commented on issue #82: URL: https://github.com/apache/arrow-datafusion-python/issues/82#issuecomment-1328085639 Will look into it in a couple of hours. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion-python] andygrove commented on issue #82: Cannot install on Mac M1 from source tarball from testpypi

2022-11-26 Thread GitBox
andygrove commented on issue #82: URL: https://github.com/apache/arrow-datafusion-python/issues/82#issuecomment-1328084577 Perhaps it is because there is no recent version of `maturin` available in testpypi? -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow-datafusion-python] andygrove commented on issue #82: Cannot install on Mac M1 from source tarball from testpypi

2022-11-26 Thread GitBox
andygrove commented on issue #82: URL: https://github.com/apache/arrow-datafusion-python/issues/82#issuecomment-1328084456 @isidentical Any idea why this fails? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow-datafusion-python] andygrove opened a new issue, #82: Cannot install on Mac M1 from source tarball from testpypi

2022-11-26 Thread GitBox
andygrove opened a new issue, #82: URL: https://github.com/apache/arrow-datafusion-python/issues/82 **Describe the bug** ``` pip3 install -i https://test.pypi.org/simple/ datafusion==0.7.0 Looking in indexes: https://test.pypi.org/simple/ Collecting datafusion==0.7.0 Using ca

[GitHub] [arrow-datafusion] kesavkolla commented on issue #212: Add support for SQL explode / unnest function

2022-11-26 Thread GitBox
kesavkolla commented on issue #212: URL: https://github.com/apache/arrow-datafusion/issues/212#issuecomment-1328082244 Yes really looking forward for this feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow-datafusion-python] andygrove opened a new issue, #81: Build Python source distribution

2022-11-26 Thread GitBox
andygrove opened a new issue, #81: URL: https://github.com/apache/arrow-datafusion-python/issues/81 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** We do not currently build a Python source distribution that we can release on PyPi

[GitHub] [arrow-rs] psvri commented on pull request #3197: Add dictionary suppport to like, ilike, nlike, nilike kernels

2022-11-26 Thread GitBox
psvri commented on PR #3197: URL: https://github.com/apache/arrow-rs/pull/3197#issuecomment-1328076197 Thanks @viirya for implementing the rest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow-datafusion-python] andygrove merged pull request #80: Fix project urls

2022-11-26 Thread GitBox
andygrove merged PR #80: URL: https://github.com/apache/arrow-datafusion-python/pull/80 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[GitHub] [arrow-datafusion-python] andygrove opened a new pull request, #80: Fix project urls

2022-11-26 Thread GitBox
andygrove opened a new pull request, #80: URL: https://github.com/apache/arrow-datafusion-python/pull/80 # Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion-python/issues/7 # Rationale for this change I could not upload the wh

[GitHub] [arrow-datafusion-python] andygrove commented on issue #7: Release version 0.7.0

2022-11-26 Thread GitBox
andygrove commented on issue #7: URL: https://github.com/apache/arrow-datafusion-python/issues/7#issuecomment-1328071856 I think the issue may be that in https://github.com/apache/arrow-datafusion-python/commit/f0d565912cd1cb86e5f268ff41bf1118e9743690#diff-50c86b7ed8ac2cf95bd48334961bf0530c

[GitHub] [arrow-datafusion-python] andygrove commented on issue #7: Release version 0.7.0

2022-11-26 Thread GitBox
andygrove commented on issue #7: URL: https://github.com/apache/arrow-datafusion-python/issues/7#issuecomment-1328070630 I am now trying to upload to testpypi using twine (based on the instructions at https://packaging.python.org/en/latest/tutorials/packaging-projects/) but am running into

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
jackwener commented on code in PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#discussion_r1032798149 ## datafusion/core/tests/sql/joins.rs: ## @@ -1635,16 +1635,16 @@ async fn reduce_left_join_3() -> Result<()> { let expected = vec![ "E

[GitHub] [arrow-datafusion] jackwener commented on pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
jackwener commented on PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#issuecomment-1328068834 > The only thing I am concerned about is the regression in supporting limit pushdown through subquery. Otherwise I think this PR could be merged. regression is resolved

[GitHub] [arrow-datafusion] jackwener commented on pull request #4384: rule support subquery alias

2022-11-26 Thread GitBox
jackwener commented on PR #4384: URL: https://github.com/apache/arrow-datafusion/pull/4384#issuecomment-1328068334 wait for #4333 #4324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow-datafusion] jackwener opened a new pull request, #4384: rule support subquery alias

2022-11-26 Thread GitBox
jackwener opened a new pull request, #4384: URL: https://github.com/apache/arrow-datafusion/pull/4384 # Which issue does this PR close? Closes #4381 . # Rationale for this change # What changes are included in this PR? # Are these changes te

[GitHub] [arrow] ursabot commented on pull request #14415: ARROW-17966: [C++] Adjust to new format for Substrait optional arguments

2022-11-26 Thread GitBox
ursabot commented on PR #14415: URL: https://github.com/apache/arrow/pull/14415#issuecomment-1328068119 Benchmark runs are scheduled for baseline = 7276c359e8be9b16ce5f122b81ec0bb89417224c and contender = 63f013cdb36d05f6f96a145aff3c6232470f2d02. 63f013cdb36d05f6f96a145aff3c6232470f2d02 is

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #3828: Enable Parquet Row and Page Filtering by default (WIP)

2022-11-26 Thread GitBox
Ted-Jiang commented on PR #3828: URL: https://github.com/apache/arrow-datafusion/pull/3828#issuecomment-1328062248 > Specifically made the parquet files like this: > > ``` > RUSTFLAGS="-C target-cpu=native" cargo run --release --bin tpch -- convert --input ~/tpch_data/data_SF1 --o

[GitHub] [arrow-datafusion] Ted-Jiang closed issue #3833: Support parquet page filtering for more types: String, Binary(Decimal), Int96

2022-11-26 Thread GitBox
Ted-Jiang closed issue #3833: Support parquet page filtering for more types: String, Binary(Decimal), Int96 URL: https://github.com/apache/arrow-datafusion/issues/3833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #3833: Support parquet page filtering for more types: String, Binary(Decimal), Int96

2022-11-26 Thread GitBox
Ted-Jiang commented on issue #3833: URL: https://github.com/apache/arrow-datafusion/issues/3833#issuecomment-1328061289 > Int96 appears to be the only unsupported type left > > https://github.com/apache/arrow-datafusion/blob/ba73c8180ebd874614cabc33be8cbb0d1db52518/datafusion/core/sr

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
jackwener commented on code in PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#discussion_r1032798149 ## datafusion/core/tests/sql/joins.rs: ## @@ -1635,16 +1635,16 @@ async fn reduce_left_join_3() -> Result<()> { let expected = vec![ "E

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
jackwener commented on code in PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#discussion_r1032798149 ## datafusion/core/tests/sql/joins.rs: ## @@ -1635,16 +1635,16 @@ async fn reduce_left_join_3() -> Result<()> { let expected = vec![ "E

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
jackwener commented on code in PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#discussion_r1032793752 ## datafusion/core/tests/sql/window.rs: ## @@ -335,18 +335,19 @@ async fn window_expr_eliminate() -> Result<()> { " Sort: d.b ASC NULLS LAST [b:Utf

[GitHub] [arrow-datafusion] jackwener opened a new issue, #4383: Add `MergeSubqueryAlias` rule

2022-11-26 Thread GitBox
jackwener opened a new issue, #4383: URL: https://github.com/apache/arrow-datafusion/issues/4383 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated whe

[GitHub] [arrow] github-actions[bot] commented on pull request #14741: ARROW-18106: [C++] JSON reader ignores explicit schema with default unexpected_field_behavior="infer"

2022-11-26 Thread GitBox
github-actions[bot] commented on PR #14741: URL: https://github.com/apache/arrow/pull/14741#issuecomment-1328054167 https://issues.apache.org/jira/browse/ARROW-18106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] benibus opened a new pull request, #14741: ARROW-18106: [C++] JSON reader ignores explicit schema with default unexpected_field_behavior="infer"

2022-11-26 Thread GitBox
benibus opened a new pull request, #14741: URL: https://github.com/apache/arrow/pull/14741 See: [ARROW-18106](https://issues.apache.org/jira/browse/ARROW-18106) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-26 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1032786851 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-datafusion] alamb commented on pull request #3885: Consolidate remaining parquet config options into ConfigOptions

2022-11-26 Thread GitBox
alamb commented on PR #3885: URL: https://github.com/apache/arrow-datafusion/pull/3885#issuecomment-1328050035 FWIW My plan for this PR is: 1. To leave any the current places to configure parquet settings per ParquetExec as overrrides (`Option`) 2. Expose remaining options via `Confi

[GitHub] [arrow-nanoarrow] lidavidm commented on issue #73: [C] Support for unions

2022-11-26 Thread GitBox
lidavidm commented on issue #73: URL: https://github.com/apache/arrow-nanoarrow/issues/73#issuecomment-1328049318 I like your idea more than my original idea, it composes better. I suppose for unions where type IDs != child indices, we can add a separate function (taking `int8_t*` or someth

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4382: Config Cleanup: Remove TaskProperties and KV structure, keep key=value serialization

2022-11-26 Thread GitBox
alamb commented on code in PR #4382: URL: https://github.com/apache/arrow-datafusion/pull/4382#discussion_r1032789766 ## datafusion/core/src/execution/context.rs: ## @@ -1844,55 +1836,52 @@ impl TaskContext { aggregate_functions: HashMap>, runtime: Arc, )

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4382: Config Cleanup: Remove TaskProperties and KV structure, keep key=value serialization

2022-11-26 Thread GitBox
alamb opened a new pull request, #4382: URL: https://github.com/apache/arrow-datafusion/pull/4382 # Which issue does this PR close? re https://github.com/apache/arrow-datafusion/issues/4349 # Rationale for this change Step 1 of N in unraveling the gordian knot of datafusi

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
jackwener commented on code in PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#discussion_r1032788324 ## datafusion/core/src/datasource/view.rs: ## @@ -474,7 +474,8 @@ mod tests { let formatted = arrow::util::pretty::pretty_format_batches(&plan)

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
jackwener commented on code in PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#discussion_r1032788324 ## datafusion/core/src/datasource/view.rs: ## @@ -474,7 +474,8 @@ mod tests { let formatted = arrow::util::pretty::pretty_format_batches(&plan)

[GitHub] [arrow-rs] tustvold opened a new pull request, #3201: Move zip and shift kernels to arrow-select

2022-11-26 Thread GitBox
tustvold opened a new pull request, #3201: URL: https://github.com/apache/arrow-rs/pull/3201 # Which issue does this PR close? Part of #2594 # Rationale for this change # What changes are included in this PR? # Are there any user-facing ch

[GitHub] [arrow-datafusion] jackwener opened a new issue, #4381: Optimizer rule support `subqueryAlias`

2022-11-26 Thread GitBox
jackwener opened a new issue, #4381: URL: https://github.com/apache/arrow-datafusion/issues/4381 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** After we use `subqueryAlias` to replace `alias in projection`, we can find some optim

[GitHub] [arrow-rs] tustvold opened a new pull request, #3200: Deprecate limit kernel

2022-11-26 Thread GitBox
tustvold opened a new pull request, #3200: URL: https://github.com/apache/arrow-rs/pull/3200 # Which issue does this PR close? Closes #. # Rationale for this change This kernel doesn't appear to serve a meaningful purpose # What changes are include

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4333: API-break: Support `SubqueryAlias` and remove `Alias in Projection`

2022-11-26 Thread GitBox
alamb commented on code in PR #4333: URL: https://github.com/apache/arrow-datafusion/pull/4333#discussion_r1032786474 ## datafusion/core/src/datasource/view.rs: ## @@ -474,7 +474,8 @@ mod tests { let formatted = arrow::util::pretty::pretty_format_batches(&plan)

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-26 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1032786851 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-rs] ursabot commented on pull request #3183: Support Duration in array_value_to_string

2022-11-26 Thread GitBox
ursabot commented on PR #3183: URL: https://github.com/apache/arrow-rs/pull/3183#issuecomment-1328045681 Benchmark runs are scheduled for baseline = befea02c2f277a95d1f80f00aa0e9591942bd723 and contender = 8c6e57960f92c0fad9982caba32f226e318313d9. 8c6e57960f92c0fad9982caba32f226e318313d9 i

[GitHub] [arrow-rs] ursabot commented on pull request #3195: Adding scalar nlike_dyn, ilike_dyn, nilike_dyn kernels

2022-11-26 Thread GitBox
ursabot commented on PR #3195: URL: https://github.com/apache/arrow-rs/pull/3195#issuecomment-1328045682 Benchmark runs are scheduled for baseline = 8c6e57960f92c0fad9982caba32f226e318313d9 and contender = 0b12828ddc75112c92541c612e7a75e5dbe44081. 0b12828ddc75112c92541c612e7a75e5dbe44081 i

[GitHub] [arrow-rs] ursabot commented on pull request #3188: To pyarrow with schema

2022-11-26 Thread GitBox
ursabot commented on PR #3188: URL: https://github.com/apache/arrow-rs/pull/3188#issuecomment-1328045677 Benchmark runs are scheduled for baseline = fd08c31a2cd37342d261f67e999b2be2d5a4ba6b and contender = befea02c2f277a95d1f80f00aa0e9591942bd723. befea02c2f277a95d1f80f00aa0e9591942bd723 i

[GitHub] [arrow-rs] tustvold merged pull request #3195: Adding scalar nlike_dyn, ilike_dyn, nilike_dyn kernels

2022-11-26 Thread GitBox
tustvold merged PR #3195: URL: https://github.com/apache/arrow-rs/pull/3195 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold closed issue #1168: Duration array values cannot be pretty printed

2022-11-26 Thread GitBox
tustvold closed issue #1168: Duration array values cannot be pretty printed URL: https://github.com/apache/arrow-rs/issues/1168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-rs] tustvold merged pull request #3183: Support Duration in array_value_to_string

2022-11-26 Thread GitBox
tustvold merged PR #3183: URL: https://github.com/apache/arrow-rs/pull/3183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3183: Duration display

2022-11-26 Thread GitBox
tustvold commented on code in PR #3183: URL: https://github.com/apache/arrow-rs/pull/3183#discussion_r1032786484 ## arrow-cast/src/display.rs: ## @@ -549,3 +553,34 @@ pub fn lexical_to_string(n: N) -> String { String::from_utf8_unchecked(buf) } } + +#[cfg(test)]

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-26 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1032786229 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-rs] tustvold commented on pull request #3188: To pyarrow with schema

2022-11-26 Thread GitBox
tustvold commented on PR #3188: URL: https://github.com/apache/arrow-rs/pull/3188#issuecomment-1328044645 Thank you :+1: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow-rs] tustvold merged pull request #3188: To pyarrow with schema

2022-11-26 Thread GitBox
tustvold merged PR #3188: URL: https://github.com/apache/arrow-rs/pull/3188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold closed issue #3136: arrow to and from pyarrow conversion results in changes in schema

2022-11-26 Thread GitBox
tustvold closed issue #3136: arrow to and from pyarrow conversion results in changes in schema URL: https://github.com/apache/arrow-rs/issues/3136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-26 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1032786229 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4357: Clean the `expr_fn`

2022-11-26 Thread GitBox
alamb commented on code in PR #4357: URL: https://github.com/apache/arrow-datafusion/pull/4357#discussion_r1032786084 ## datafusion/expr/src/expr_fn.rs: ## @@ -345,136 +332,181 @@ macro_rules! nary_scalar_expr { // generate methods for creating the supported unary/binary expres

[GitHub] [arrow] ursabot commented on pull request #14709: ARROW-18384: [Release][MSYS2] Show pull request title

2022-11-26 Thread GitBox
ursabot commented on PR #14709: URL: https://github.com/apache/arrow/pull/14709#issuecomment-1328044359 Benchmark runs are scheduled for baseline = 405b54ee3533f3db5099ce24d0864d34ba5a3b78 and contender = 7276c359e8be9b16ce5f122b81ec0bb89417224c. 7276c359e8be9b16ce5f122b81ec0bb89417224c is

[GitHub] [arrow-datafusion] ursabot commented on pull request #4373: HashJoin should return Err when the right side input stream produce Err, add more join UTs to cover different join types

2022-11-26 Thread GitBox
ursabot commented on PR #4373: URL: https://github.com/apache/arrow-datafusion/pull/4373#issuecomment-1328044350 Benchmark runs are scheduled for baseline = 3b7f76714c72a0488f141d2869ada737e0a0df39 and contender = 27ae14aeec840f404d5ebe44e341f5dbea4c6f63. 27ae14aeec840f404d5ebe44e341f5dbe

[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files

2022-11-26 Thread GitBox
DataPsycho commented on code in PR #4360: URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032785785 ## datafusion-examples/examples/dataframe.rs: ## @@ -41,3 +44,47 @@ async fn main() -> Result<()> { Ok(()) } + +// Example to read data from a csv f

[GitHub] [arrow-rs] doki23 commented on pull request #3188: To pyarrow with schema

2022-11-26 Thread GitBox
doki23 commented on PR #3188: URL: https://github.com/apache/arrow-rs/pull/3188#issuecomment-1328044032 Great! 👍🏻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4373: HashJoin should return Err when the right side input stream produce Err, add more join UTs to cover different join types

2022-11-26 Thread GitBox
alamb commented on code in PR #4373: URL: https://github.com/apache/arrow-datafusion/pull/4373#discussion_r1032785613 ## datafusion/sql/src/planner.rs: ## @@ -767,7 +767,8 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { let join_filter = filter.into_iter().red

[GitHub] [arrow-datafusion] alamb merged pull request #4373: HashJoin should return Err when the right side input stream produce Err, add more join UTs to cover different join types

2022-11-26 Thread GitBox
alamb merged PR #4373: URL: https://github.com/apache/arrow-datafusion/pull/4373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb closed issue #4362: HashJoin should return Err when the right side input stream produce Err

2022-11-26 Thread GitBox
alamb closed issue #4362: HashJoin should return Err when the right side input stream produce Err URL: https://github.com/apache/arrow-datafusion/issues/4362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow-datafusion] jackwener commented on pull request #4377: Refactor the Hash Join

2022-11-26 Thread GitBox
jackwener commented on PR #4377: URL: https://github.com/apache/arrow-datafusion/pull/4377#issuecomment-1328043453 I will review this PR carefully tomorrow, thanks @liukun4515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow-datafusion] alamb commented on issue #4363: Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions

2022-11-26 Thread GitBox
alamb commented on issue #4363: URL: https://github.com/apache/arrow-datafusion/issues/4363#issuecomment-1328043376 I think it is also possible to extend the HashJoin implementation with the new semantics rather than adding an entirely new physical join implementation -- This is an autom

  1   2   >