[ https://issues.apache.org/jira/browse/ARROW-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230223#comment-17230223 ]
Yordan Pavlov edited comment on ARROW-10173 at 11/12/20, 9:42 PM: ------------------------------------------------------------------ I have an initial implementation of direct comparison operations to scalar values in datafusion which, for the simple query used in the benchmark ("select f32, f64 from t where f32 >= 250 and f64 > 250") shows approximately 10x performance improvement: before: filter_scalar time: [35.733 ms 36.613 ms 37.924 ms] after: filter_scalar time: [3.5938 ms 3.6450 ms 3.7035 ms] change: [-90.048% -89.846% -89.625%] (p = 0.00 < 0.05) I have also added a benchmark to compare the change in performance when comparing two arrays (using query "select f32, f64 from t where f32 >= f64") and it is negligible: before: filter_array time: [11.601 ms 11.656 ms 11.718 ms] after: filter_array time: [11.854 ms 11.957 ms 12.070 ms] change: [+1.8032% +3.6391% +5.5671%] (p = 0.00 < 0.05) I will be submitting a PR for this change soon. was (Author: yordan-pavlov): I have an initial implementation of direct comparison operations to scalar values in datafusion which shows promising results. For a simple query like this: "select f32, f64 from t where f32 >= 250 and f64 > 250" the old, all array implementation takes this long: filter_20_12 time: [74.186 ms 76.356 ms 78.572 ms] where as the new implementation shows approximately 10x performance improvement: filter_20_12 time: [6.5256 ms 6.5701 ms 6.6171 ms] change: [-91.173% -90.996% -90.820%] (p = 0.00 < 0.05) I will be submitting a PR for this change soon. > [Rust][DataFusion] Improve performance of equality to a constant predicate > support > ---------------------------------------------------------------------------------- > > Key: ARROW-10173 > URL: https://issues.apache.org/jira/browse/ARROW-10173 > Project: Apache Arrow > Issue Type: Improvement > Reporter: Andrew Lamb > Priority: Major > > I noticed this behavior while working on support for DictionaryArrays and > wanted to capture it in a ticket in case someone has time to work on it. > In order to implement an equality predicate to a constant such as {{d1 = > 'three'}}, DataFusion effectively creates an array with the same value > {{'three'}} repeated over and over again and uses the equality compute > kernel. This is ... suboptimal. > Here is what the predicate looks like: > {code} > predicate: BinaryExpr { > left: CastExpr { > expr: Column { > name: "d1", > }, > cast_type: Utf8, > }, > op: Eq, > right: Literal { > value: Utf8("three"), > }, > }, > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)