Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-17 Thread via GitHub


jayzhan211 closed issue #15686: Allow parsing byte literals as FixedSizeBinary
URL: https://github.com/apache/datafusion/issues/15686


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-15 Thread via GitHub


leoyvens commented on issue #15686:
URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2805353246

   Ok I hear you, I'll try to do this without touching the parser and its 
configs.
   
   A concern with an LHS coercion is efficient pushdown. `col = x'lit'` can 
make use of statistics and parquet-level pruning, while `col::bytea = x'lit'` 
is a full-column scan.
   
   I will try to pursue the following approach:
   1. We do the LHS coercion.
   2. The `ExprSimplifier` is extended to expressions of the form:
   ```
   Cast(fixed_size_binary_col, 'Binary') = x'lit'
   ```
   If the lengths don't match, it is simplified to `false`, if they do match, 
the cast is moved to the RHS.
  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-14 Thread via GitHub


alamb commented on issue #15686:
URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2802965975

   Here is a sql reproducer of what happens today
   
   ```sql
   > create table t as values (arrow_cast(x'deadbeef', 'FixedSizeBinary(4)'));
   0 row(s) fetched.
   Elapsed 0.006 seconds.
   
   > select column1 = x'deadbeef' from t;
   Error during planning: Cannot infer common argument type for comparison 
operation FixedSizeBinary(4) = Binary
   ```
   
   > Coercing the LHS to Binary is less performant. Unconditionally parsing 
literals as FixedSizeBytes would be a breaking change.
   
   I wonder how much less performant this in in practice 🤔  In theory we can 
have custom kernels with custom code for different array sizes, but I think in 
practice it actually ends up always checking the length
   
   I really think we should avoid a new config option if possible
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-12 Thread via GitHub


leoyvens commented on issue #15686:
URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2799013192

   @alamb thank you for looking at this. Avoiding a config flag would be nice. 
But I'm skeptical of the proposed coercion.
   
   If we coerce `binary` to `fixed(N)` when encountering any expression of 
type: `fixed(N) = binary`, this would be a fallible coercion. While the 
infallible coercion of `fixed(N)` to `binary` would allow the expression to be 
evaluated for all values. DataFusion has already chosen this more lenient 
coercion for lists. The following:
   ```
   select arrow_cast([1], 'FixedSizeList(1, Int64)') = [1, 2];
   ```
   Evaluates today to `false` instead of giving a coercion error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-12 Thread via GitHub


alamb commented on issue #15686:
URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2798808874

   > Coercing the LHS to Binary is less performant. Unconditionally parsing 
literals as FixedSizeBytes would be a breaking change.
   
   What about coercing the RHS to FixedSizeBinary?
   
   So like
   
   ```sql
   where byte_column = x'deadbeef'
   ```
   
   Becomes
   
   ```sql
   where byte_column = arrow_cast(x'deadbeef', FixedSizeBinary)
   ```
   
   This would have the benefit of:
   1. Easier to use (no config flag)
   2. Could handle non exactly sizes things like casting `0xdead` to 
`0xdead` for comparisoon with 4 byte integers for example


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]