Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]
jayzhan211 closed issue #15686: Allow parsing byte literals as FixedSizeBinary URL: https://github.com/apache/datafusion/issues/15686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]
leoyvens commented on issue #15686: URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2805353246 Ok I hear you, I'll try to do this without touching the parser and its configs. A concern with an LHS coercion is efficient pushdown. `col = x'lit'` can make use of statistics and parquet-level pruning, while `col::bytea = x'lit'` is a full-column scan. I will try to pursue the following approach: 1. We do the LHS coercion. 2. The `ExprSimplifier` is extended to expressions of the form: ``` Cast(fixed_size_binary_col, 'Binary') = x'lit' ``` If the lengths don't match, it is simplified to `false`, if they do match, the cast is moved to the RHS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]
alamb commented on issue #15686: URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2802965975 Here is a sql reproducer of what happens today ```sql > create table t as values (arrow_cast(x'deadbeef', 'FixedSizeBinary(4)')); 0 row(s) fetched. Elapsed 0.006 seconds. > select column1 = x'deadbeef' from t; Error during planning: Cannot infer common argument type for comparison operation FixedSizeBinary(4) = Binary ``` > Coercing the LHS to Binary is less performant. Unconditionally parsing literals as FixedSizeBytes would be a breaking change. I wonder how much less performant this in in practice 🤔 In theory we can have custom kernels with custom code for different array sizes, but I think in practice it actually ends up always checking the length I really think we should avoid a new config option if possible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]
leoyvens commented on issue #15686: URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2799013192 @alamb thank you for looking at this. Avoiding a config flag would be nice. But I'm skeptical of the proposed coercion. If we coerce `binary` to `fixed(N)` when encountering any expression of type: `fixed(N) = binary`, this would be a fallible coercion. While the infallible coercion of `fixed(N)` to `binary` would allow the expression to be evaluated for all values. DataFusion has already chosen this more lenient coercion for lists. The following: ``` select arrow_cast([1], 'FixedSizeList(1, Int64)') = [1, 2]; ``` Evaluates today to `false` instead of giving a coercion error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]
alamb commented on issue #15686: URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2798808874 > Coercing the LHS to Binary is less performant. Unconditionally parsing literals as FixedSizeBytes would be a breaking change. What about coercing the RHS to FixedSizeBinary? So like ```sql where byte_column = x'deadbeef' ``` Becomes ```sql where byte_column = arrow_cast(x'deadbeef', FixedSizeBinary) ``` This would have the benefit of: 1. Easier to use (no config flag) 2. Could handle non exactly sizes things like casting `0xdead` to `0xdead` for comparisoon with 4 byte integers for example -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
