Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-07 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2643315586

   The release has been approved and published to crates.io: 
https://lists.apache.org/thread/73plk4nhdjby58knd5rvwsmsjzpbcg7s
   
   > With 4 +1 votes (3 binding) the release is approved!
   > 
   > The release is available here:
   >   https://dist.apache.org/repos/dist/release/datafusion/datafusion-45.0.0
   > 
   > I have also published the release on [crates.io](http://crates.io/): 
https://crates.io/crates/datafusion/45.0.0
   > 
   > Thanks everyone for your help / support -- I hope this one is epic (and a 
low drama release)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-07 Thread via GitHub


alamb closed issue #14008: Release DataFusion `45.0.0`
URL: https://github.com/apache/datafusion/issues/14008


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-05 Thread via GitHub


wiedld commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2637421384

   @alamb -- release candidate re-tested on InfluxData, and is good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-05 Thread via GitHub


Omega359 commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2636935066

   Beyond the type coercion issues that I can work around my testing is working 
... as long as I don't compile with 'release' target. That seems to segfault I 
think somewhere in DF code but I haven't yet been able to get a core dump from 
the crashing nodes to investigate further. I thought it was Rust 1.84.1 
dependent but I retested with 1.83.0 and had the same issue. 
   
   I don't think it's a blocker but is something I'm going to continue to try 
and narrow down.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-04 Thread via GitHub


Omega359 commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2633412221

   I don't think I'll have the bandwidth to test a type coercion fix for UDF's 
myself this week to be honest. I'm about to fire off a full run of my 
application against the 45 branch but I likely can only do that once this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-03 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2632898299

   > It would also be really nice to sort out this issue from 
[@shehabgamin](https://github.com/shehabgamin) but it isn't clear to me we can 
fix that with a small non-risky patch
   > 
   > * [DataFusion Regression (Starting in v43): Type Coercion for UDF 
Arguments (Int --> String) 
#14230](https://github.com/apache/datafusion/issues/14230)
   
   I think we came to a temporary solution here 
https://github.com/apache/datafusion/pull/14268#issuecomment-2632894156
   CC @alamb @Omega359 @jayzhan211 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-03 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2632865717

   I updated 
https://github.com/apache/datafusion/issues/14408#issue-2825591642, but same 
issues as before with:
   Test 2: Commit `26058ac` (DataFusion 
[`45.0.0-rc1`](https://github.com/apache/datafusion/tree/45.0.0-rc1)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-03 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2632246972

   @alamb I'll test on Sail sometime today!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-03 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2632204435

   Ok, I think we are ready with a release candidate:
   
   tag here: https://github.com/apache/datafusion/releases/tag/45.0.0-rc1
   
   Release voting thread: 
https://lists.apache.org/thread/g20ywc9yto8xp07lcllmvgyn8g5z4420
   
   
   Content
   
   
   
   
   This release candidate is based on commit: 
26058ac024095ad8852eb3a8ab707ac09a02e8d7 [1]
   The proposed release tarball and signatures are hosted at [2].
   The changelog is located at [3].
   
   The standard verification procedure is documented at 
https://github.com/apache/datafusion/blob/main/dev/release/README.md#verifying-release-candidates.
   
   [1]: 
https://github.com/apache/datafusion/tree/26058ac024095ad8852eb3a8ab707ac09a02e8d7
   [2]: 
https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-45.0.0-rc1
   [3]: 
https://github.com/apache/datafusion/blob/26058ac024095ad8852eb3a8ab707ac09a02e8d7/CHANGELOG.md
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-03 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2631959164

   Backport PRs (for my own OCD sanity)
   - [x] https://github.com/apache/datafusion/pull/14453
   - [ ] https://github.com/apache/datafusion/pull/14454
   - [ ] https://github.com/apache/datafusion/pull/14455
   - [ ] https://github.com/apache/datafusion/pull/14456
   - [ ] https://github.com/apache/datafusion/pull/14457
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-03 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2631773065

   @Omega359 has a fix for https://github.com/apache/datafusion/issues/11911
   - https://github.com/apache/datafusion/pull/14449
   
   This afternoon I will make some backport PRs and hopefully get a release 
candidate created 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-03 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2631357227

   Update on the release:
   
   # `main` is open for new code (will be 46.0.0`
   
   I have created a 45.0.0 branch from which to make the 45.0.0 release
   - https://github.com/apache/datafusion/tree/branch-45
   
   I would very much like to get fixes for the following issues into the 45 
branch as I think they are regressions that prevented some people from 
upgrading from 43 --> 44 and the fixes are fairly small:
   - [ ] https://github.com/apache/datafusion/pull/14377 (Utf8View regression 
in 43)
   - [ ] https://github.com/apache/datafusion/pull/14385 (regression from 43)
   - [ ] https://github.com/apache/datafusion/pull/14387 (regression from 43)
   - [ ] https://github.com/apache/datafusion/issues/11911 (Utf8View regression 
in 43) (mentioned by @Omega359  above)
   
   It would also be really nice to sort out this issue from @shehabgamin but it 
isn't clear to me we can fix that with a small non-risky patch
   - https://github.com/apache/datafusion/issues/14230


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-03 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2631116639

   > > Hi [@kevinjqliu](https://github.com/kevinjqliu) -- I am not sure what 
the plan for datafusion-python is
   > 
   > I am currently working on building the 44 release. I hope to get that done 
this morning or tomorrow. It's generally run about a month behind this upstream 
repo
   
   In case anyone is interested, here is the release thread:
   - https://lists.apache.org/thread/gkk4lq6k19gcq9xw64mcmbxrnf68o95s


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub


timsaucer commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629425413

   > Hi [@kevinjqliu](https://github.com/kevinjqliu) -- I am not sure what the 
plan for datafusion-python is
   
   I am currently working on building the 44 release. I hope to get that done 
this morning or tomorrow. It's generally run about a month behind this upstream 
repo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub


jayzhan211 commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629384465

   > Digging into the code, I see that it's deprecated (it should still work 
even though it's deprecated). What's strange, however, is that the deprecation 
warning is not propagating to me as a downstream user. I only found this out 
due to a failed test.
   
   I found that we can't make this deprecated but a breaking change if the user 
override the `is_nullable` function because it takes `schema: &dyn ExprSchema` 
which is not the case for `return_type_from_exprs`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629381090

   Ok, given the feedback what I think we should do is:
   1. Complete the testing on our release branch (`branch-45`) and backport any 
fixes needed
   2. Continue development on main as previously planned. 
   
   Thanks @shehabgamin for the testing and reports. I'll take a more careful 
look tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629379592

   > hey @alamb quick question about the release process, are the release for 
datafusion and datafusion-python in locksteps? I see the next release for 
datafusion is 45.0.0 meanwhile [datafusion-python's main branch is currently on 
43.0.0](https://github.com/apache/datafusion-python/blob/8b513906315a0749b9f5cd6f34bf259ab4dd1add/Cargo.toml#L19-L20)
   
   Hi @kevinjqliu  -- I am not sure what the plan for datafusion-python is 
   
   The main branch seems to be on 44 to me 🤔 
   
https://github.com/apache/datafusion-python/blob/8b513906315a0749b9f5cd6f34bf259ab4dd1add/Cargo.toml#L41-L44
   
   
   @timsaucer  updated it  a few weeks ago
   - https://github.com/apache/datafusion-python/pull/973
   
   
   Any chance you can make a PR to test upgrading datafusion-python to 45?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629314116

   Another regression is that implementing the `is_nullable` function in the 
`ScalarUDFImpl` trait no longer works. For example:
   ```
   impl ScalarUDFImpl for SparkArray {
   ...
   fn is_nullable(&self, _args: &[Expr], _schema: &dyn ExprSchema) -> bool {
   false
   }
   ...
   }
   ```
   Digging into the code, I see that it's deprecated (it should still work even 
though it's deprecated). What's strange, however, is that the deprecation 
warning is not propagating to me as a downstream user. I only found this out 
due to a failed test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629235506

   > Derived TPC-DS Query 66 
(https://github.com/lakehq/sail/blob/main/python/pysail/data/tpcds/queries/q66.sql)
 fails on Sail, when it previously did not.
   
   Error is:
   ```
   Exception: Error executing query q66 with error: Internal error:
   Physical input schema should be the same as the one converted from logical 
input schema. 
   Differences:
- field nullability at index 7 [#98]: (physical) false vs (logical) true.
   ```
   
   @alamb I created another PR that's dedicated only to the DF 45. Could you 
please update the issue with this PR: https://github.com/lakehq/sail/pull/365
   
   Additionally, I created a tracking issue: 
https://github.com/apache/datafusion/issues/14408
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629154530

   Derived TPC-DS Query 66 
(https://github.com/lakehq/sail/blob/main/python/pysail/data/tpcds/queries/q66.sql)
 fails on Sail, when it previously did not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub


kevinjqliu commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629109322

   hey @alamb quick question about the release process, are the release for 
datafusion and datafusion-python in locksteps? I see the next release for 
datafusion is 45.0.0 meanwhile [datafusion-python's main branch is currently on 
43.0.0](https://github.com/apache/datafusion-python/blob/8b513906315a0749b9f5cd6f34bf259ab4dd1add/Cargo.toml#L19-L20)
   
   is the plan to release datafusion 45.0.0 and then upgrade datafusion-python 
to 45.0.0 too and take on the new datafusion library version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629106680

   Thanks @Omega359 
   
   The delta-rs upgrade seems to have gone pretty smootly: 
https://github.com/delta-io/delta-rs/pull/3175
   
   In terms of releasing DataFusion 45 I think it is better than 44 so I was 
planning to make the RC tomorrow, but I can wait a day or two if we want to 
make a push to finish up a few of those issues
   
   > utf8view, i32 comparison no longer worked
   
   I believe this is fixed in the following PR (I am waiting on another 
committer to approve and if they do I can backport it)
   - https://github.com/apache/datafusion/pull/14377
   
   > i32 -> utf8view coercion in regexp_like udf stopped working. 
   
   I believe this is tracked by 
   - https://github.com/apache/datafusion/issues/11911
   
   But is waiting on the next arrow release (which I just need to scrounge up 
one more PMC vote to approve and release). It might "just work" after that
   - https://github.com/apache/arrow-rs/issues/6929


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub


Omega359 commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629085175

   I just upgraded my project to latest main from DF 42. The primary 
compilation and test suite issues I encountered after setting 
`datafusion.execution.parquet.schema_force_view_types` -> false (I'm going to 
fix all my .as_string_array(..) and utf8 datatype assumptions later):
   
   utf8view, i32 comparison no longer worked
   ```
   .with_column(
   STRING_FIELD,
   when(col(STRING_FIELD).eq(lit(83)), 
lit(82)).otherwise(col(STRING_FIELD))?,
   )?
   ```
   switching to the obvious change below worked
   ```
   .with_column(
   STRING_FIELD,
   when(col(STRING_FIELD).eq(lit("83")), 
lit("82")).otherwise(col(STRING_FIELD))?,
   )?
   ```
   
   i32 -> utf8view coercion in udf's stopped working. For example: 
`regexp_like(field_a, '[^0-9]')` where field_a is an int field. changed to 
`regexp_like(field_a::varchar, '[^0-9]')`
   
   Auto coercion of date/timestamp to utf8 no longer worked. I had to update my 
code to use `to_char` function to properly write out the date/timestamp. Not a 
bad thing really.
   
   I'll likely do a full data run with this early next week.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629031385

   I have created a branch:
   - https://github.com/apache/datafusion/tree/branch-45
   
   Let's start merging stuff to main again. 
   
   Before I make the RC I want to verify against delta.rs
   
   If we are able to get the fixes for any of the following issues reviewed / 
merged to main we can' port them to `branch-45`
   - https://github.com/apache/datafusion/pull/14377 (UTF8Vew papercut)
   - https://github.com/apache/datafusion/pull/14385 (regression from 43)
   - https://github.com/apache/datafusion/pull/14387 (regression regression 
from 43)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2628949060

   @wiedld  tested with InfluxDB and this upgrade works for us


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232

   I have created a PR for a version increase and 
   - https://github.com/apache/datafusion/pull/14397
   
   Once that is approved / merged I'll create a release-45 branch to make RCs 
from 
   
   We can do final testing / backporting any needed fixes there. 
   
   In my mind the following PRs would be great to get into 45 but since they 
all existed in 44 as well, so they shouldn't block 45 the release
   - https://github.com/apache/datafusion/pull/14377 (UTF8Vew papercut)
   - https://github.com/apache/datafusion/pull/14385 (regression from 43)
   - https://github.com/apache/datafusion/pull/14387 (regression regression 
from 43)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-31 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2628544349

   Release update:
   In order to avoid holding up merges main too much longer I plan to make a 
release branch tomorrow and we can backport individual fixes as needed .
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-31 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2627806359

   Here is a proposed fix for https://github.com/apache/datafusion/issues/13510
   - https://github.com/apache/datafusion/pull/14387


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-31 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2627581883

   I have a proposed fix for https://github.com/apache/datafusion/issues/14154 
up now
   - https://github.com/apache/datafusion/pull/14385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-31 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2627012274

   I went through the list of issues again and I think these are the only ones 
that should be fixed in my opinion before the release 
   - [ ] https://github.com/apache/datafusion/issues/13359
   - [ ] https://github.com/apache/datafusion/issues/14230
   - [ ] https://github.com/apache/datafusion/issues/13510
   - [ ] https://github.com/apache/datafusion/issues/14154
   
   We have PRs up for the first two and I will help / work on the second two 
today
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-27 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2616100769

   > Some people currently use ValuesExec::try_new_from_batches, so 
MemoryExec::try_new_as_values wouldn't necessarily be a suitable substitute.
   
   @shehabgamin --makes sense. I made this PR to try and clarify:
   - https://github.com/apache/datafusion/pull/14322
   
   Thank you again for the testing
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-27 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2616074861

   I am starting to do some ticket triage and prepare for the release
   
   > IMO this should go on the "must fix" list too. I'll make sure to have the 
PR ready by the end of the weekend.
   
   Done
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-26 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2614301913

   @alamb @jayzhan211 https://github.com/apache/datafusion/pull/14268 is ready 
for review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-24 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2613799483

   > Most of the regressions are related to this issue: 
[#14230](https://github.com/apache/datafusion/issues/14230). I should be able 
to resolve them well before the `45` release.
   
   It turns out that type coercion for UDF arguments (TypeSignature::Coercible) 
was not being applied to the majority of types. I adjusted the scope of 
https://github.com/apache/datafusion/issues/14230 and 
https://github.com/apache/datafusion/pull/14268 to reflect this.
   
   IMO this should go on the "must fix" list too. I'll make sure to have the PR 
ready by the end of the weekend.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-24 Thread via GitHub


andygrove commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2613057921

   @alamb I took the liberty of adding 
https://github.com/apache/datafusion/issues/14277 to the "must fix" list


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-24 Thread via GitHub


andygrove commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2612692120

   I created an issue to track our progress with upgrading Comet to use 
DataFusion 45 and linked to it from the PR description: 
https://github.com/apache/datafusion/issues/14274


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-23 Thread via GitHub


jayzhan211 commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2611311173

   I see.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-23 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2610743660

   > 
   > To replace `ValuesExec`, `try_new_as_values` is the right one to use not 
`try_new_from_batches`
   
   Some people currently use `ValuesExec::try_new_from_batches`, so 
`MemoryExec::try_new_as_values` wouldn't necessarily be a suitable substitute.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-23 Thread via GitHub


jayzhan211 commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2609768983

   > ValuesExec is now deprecated. The deprecation message is a bit confusing 
though. It currently states: "Use MemoryExec::try_new_as_values instead", but I 
think should say: "Use MemoryExec::try_new_as_values or 
MemoryExec::try_new_from_batches instead". Or, just simply: "Use MemoryExec 
instead".
   
   To replace `ValuesExec`, `try_new_as_values` is the right one to use not 
`try_new_from_batches`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-22 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2609034408

   My apologies @alamb, the DataFusion upgrade from the latest main branch 
commit is smoother than I initially thought. After investigating the flood of 
errors, I discovered that many were resolved by simply updating Sail's 
`serde-arrow` dependency to Arrow `54`. Projects without PyO3 or the `pyarrow` 
feature in DataFusion should experience a seamless upgrade (as of writing). 
Projects using PyO3 with the `pyarrow` feature enabled will have varying 
experiences based on their usage of PyO3.
   
   **PyO3 `0.23.3`**
   DataFusion `45` upgrades from PyO3 `0.22` to `0.23.3`. This is an exciting 
change, but may introduce significant breaking changes for PyO3 users. Since 
these changes vary based on PyO3 usage, I'm not listing Sail's specific changes 
here. Users can refer to the PyO3 migration guide: 
https://pyo3.rs/v0.23.0/migration
   
   **DataFusion**
   `ValuesExec` is now deprecated. The deprecation message is a bit confusing 
though. It currently states: "Use `MemoryExec::try_new_as_values` instead", but 
I think should say: "Use `MemoryExec::try_new_as_values` or 
`MemoryExec::try_new_from_batches` instead". Or, just simply: "Use `MemoryExec` 
instead".
   
   If you'd like to see these changes, they're in my PR that's testing the 
regression fixes: https://github.com/lakehq/sail/pull/355


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-22 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2608409140

   > Thanks [@shehabgamin](https://github.com/shehabgamin) -- Can you enumerate 
these changes (or point me at a PR) so we can see if there is some way to make 
jarring
   
   Yeah I'll work on that right now!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-22 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2607619413

   > While testing my local Sail code with the latest commit on DataFusion's 
main branch, I encountered several breaking changes that may make DataFusion 45 
a jarring upgrade for some users
   
   Thanks @shehabgamin  -- Can you enumerate these changes (or point me at a 
PR) so we can see if there is some way to make jarring


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-22 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2606934116

   > 
   > I'll take a deeper look into the issue after the weekend. Hope you have a 
great rest of your weekend!
   
   Most of the regressions are related to this issue: 
https://github.com/apache/datafusion/issues/14230. I should be able to resolve 
them well before the `45` release.
   
   While testing my local Sail code with the latest commit on DataFusion's main 
branch, I encountered several breaking changes that may make DataFusion 45 a 
jarring upgrade for some users. Given the previous discussion about wanting to 
make releases less jarring 
(https://github.com/apache/datafusion/issues/13334#issuecomment-2552465244), I 
wanted to bring this to your attention, @alamb.
   
   Aside from that, there is one remaining regression I haven't investigated 
yet, which seems to be related to Parquet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-17 Thread via GitHub


shehabgamin commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2599492511

   As promised, Sail is working on porting relevant tests into DataFusion.
   
   A good starting point is a regression our tests caught in DataFusion 43, 
which still seems to persist in DataFusion 44. A regression was introduced in 
DataFusion 43.0.0 related to casting to UTF8 in various places. Upgrading to 
DataFusion 43.0.0 required adding explicit casting in several areas as a 
workaround. This PR (https://github.com/lakehq/sail/pull/355) comments out 
those changes to expose the regression through the 12 additional failed tests 
compared to the main branch.
   
   Once I’ve pinpointed the root cause(s) of the regression, I’ll create an 
issue in DataFusion to track the work. I want to ensure the issue accurately 
reflects the problem before filing it. I’m happy to address these regressions 
and port over the tests that cover them in the same PR. Hopefully, we can get 
this resolved in time for the DataFusion 45 release!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-15 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2593564670

   I plan to start assembing the release candidate and test on the week of Jan 
27 (in about 2 weeks time()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-14 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2589707968

   
   > > 
   > > I am happy to do it again for 45 if no one else would like the 
opportunity (see what I did there 😆 )
   > 
   > Thanks, alamb, I booked 46 in advance!
   
   Awesome -- I filed https://github.com/apache/datafusion/issues/14123 to 
track 46


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-13 Thread via GitHub


xudong963 commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2589148761

   > > I don't have a preference. I will traveling around this time though, so 
perhaps it would make sense for someone else to be release manager for this one.
   > 
   > I am happy to do it again for 45 if no one else would like the opportunity 
(see what I did there 😆 )
   
   Thanks, alamb! I booked 46 in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-13 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2588105358

   > I don't have a preference. I will traveling around this time though, so 
perhaps it would make sense for someone else to be release manager for this one.
   
   
   I am happy to do it again  for 45 if no one else would like the opportunity 
(see what I did there 😆 )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-13 Thread via GitHub


andygrove commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2588095820

   > [@andygrove](https://github.com/andygrove) would you like to coordinate 
this release or would you like me to? (or does anyone else want to do so?)
   
   I don't have a preference. I will traveling around this time though, so 
perhaps it would make sense for someone else to be release manager for this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-13 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2587937242

   I also added some issues to the description above that I think would be 
worth fixing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-13 Thread via GitHub


alamb commented on issue #14008:
URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2587875376

   @andygrove would you like to coordinate this release or would you like me 
to? (or does anyone else want to do so?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]