Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-06-07 Thread via GitHub


alamb commented on code in PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#discussion_r2133774454


##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Call a window function with a set of arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box),

Review Comment:
   Turns out I was doing something else and got the size down to  `128`:
   - https://github.com/apache/datafusion/pull/16320
   
   Hopefully this will merge in a bit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-06-02 Thread via GitHub


alamb merged PR #16207:
URL: https://github.com/apache/datafusion/pull/16207


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-06-02 Thread via GitHub


alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2930174694

   πŸš€  -- thanks again @hendrikmakait 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-06-01 Thread via GitHub


alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2927412038

   I plan to merge this tomorrow so it can be included in DataFusion 48.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-30 Thread via GitHub


Dandandan commented on code in PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#discussion_r2116490428


##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Call a window function with a set of arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box),

Review Comment:
   Agree - although it seems the size of the changes will be lower than 272 to 
144 (I hoped maybe there would be a single one to reduce it from 144 -> 80). 
   Thanks for checking!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-30 Thread via GitHub


alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922908904

   I'll leave this open for a few days to gather any remaining feedback


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-30 Thread via GitHub


alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922659103

   πŸ€–: Benchmark completed
   
   Details
   
   
   
   ```
   group main   
reduce-expr-struct-size
   -    
---
   logical_aggregate_with_join   1.03749.1Β±2.47Β΅s? 
?/sec1.00728.8Β±3.55Β΅s? ?/sec
   logical_select_all_from_1000  1.00123.6Β±0.31ms? 
?/sec1.00123.5Β±0.23ms? ?/sec
   logical_select_one_from_700   1.01422.4Β±1.88Β΅s? 
?/sec1.00419.2Β±1.88Β΅s? ?/sec
   logical_trivial_join_high_numbered_columns1.02385.4Β±5.88Β΅s? 
?/sec1.00378.7Β±6.61Β΅s? ?/sec
   logical_trivial_join_low_numbered_columns 1.01370.3Β±1.93Β΅s? 
?/sec1.00365.2Β±3.99Β΅s? ?/sec
   physical_intersection 1.04   869.0Β±11.57Β΅s? 
?/sec1.00831.8Β±5.20Β΅s? ?/sec
   physical_join_consider_sort   1.03   1402.3Β±6.23Β΅s? 
?/sec1.00   1355.0Β±5.68Β΅s? ?/sec
   physical_join_distinct1.01359.1Β±2.18Β΅s? 
?/sec1.00354.8Β±1.38Β΅s? ?/sec
   physical_many_self_joins  1.02 10.4Β±0.05ms? 
?/sec1.00 10.2Β±0.05ms? ?/sec
   physical_plan_clickbench_all  1.03147.7Β±1.46ms? 
?/sec1.00143.3Β±1.44ms? ?/sec
   physical_plan_clickbench_q1   1.02  1730.9Β±17.67Β΅s? 
?/sec1.00  1691.0Β±22.98Β΅s? ?/sec
   physical_plan_clickbench_q10  1.05  2.5Β±0.02ms? 
?/sec1.00  2.4Β±0.04ms? ?/sec
   physical_plan_clickbench_q11  1.02  2.6Β±0.03ms? 
?/sec1.00  2.5Β±0.03ms? ?/sec
   physical_plan_clickbench_q12  1.03  2.7Β±0.03ms? 
?/sec1.00  2.6Β±0.05ms? ?/sec
   physical_plan_clickbench_q13  1.04  2.4Β±0.03ms? 
?/sec1.00  2.3Β±0.03ms? ?/sec
   physical_plan_clickbench_q14  1.02  2.6Β±0.02ms? 
?/sec1.00  2.5Β±0.04ms? ?/sec
   physical_plan_clickbench_q15  1.05  2.5Β±0.03ms? 
?/sec1.00  2.4Β±0.02ms? ?/sec
   physical_plan_clickbench_q16  1.03  2.3Β±0.03ms? 
?/sec1.00  2.3Β±0.02ms? ?/sec
   physical_plan_clickbench_q17  1.03  2.4Β±0.02ms? 
?/sec1.00  2.4Β±0.02ms? ?/sec
   physical_plan_clickbench_q18  1.02  1998.0Β±14.44Β΅s? 
?/sec1.00  1961.3Β±21.79Β΅s? ?/sec
   physical_plan_clickbench_q19  1.05  2.9Β±0.05ms? 
?/sec1.00  2.8Β±0.04ms? ?/sec
   physical_plan_clickbench_q2   1.02  1953.9Β±22.03Β΅s? 
?/sec1.00  1910.3Β±21.29Β΅s? ?/sec
   physical_plan_clickbench_q20  1.02  1690.4Β±24.95Β΅s? 
?/sec1.00  1657.7Β±19.62Β΅s? ?/sec
   physical_plan_clickbench_q21  1.04  2.0Β±0.06ms? 
?/sec1.00  1955.6Β±24.11Β΅s? ?/sec
   physical_plan_clickbench_q22  1.04  2.6Β±0.04ms? 
?/sec1.00  2.5Β±0.03ms? ?/sec
   physical_plan_clickbench_q23  1.07  3.0Β±0.05ms? 
?/sec1.00  2.8Β±0.03ms? ?/sec
   physical_plan_clickbench_q24  1.04  4.7Β±0.08ms? 
?/sec1.00  4.5Β±0.04ms? ?/sec
   physical_plan_clickbench_q25  1.03  2.0Β±0.02ms? 
?/sec1.00  1978.0Β±26.06Β΅s? ?/sec
   physical_plan_clickbench_q26  1.02  1838.4Β±25.40Β΅s? 
?/sec1.00  1801.0Β±23.81Β΅s? ?/sec
   physical_plan_clickbench_q27  1.03  2.1Β±0.02ms? 
?/sec1.00  1996.3Β±19.56Β΅s? ?/sec
   physical_plan_clickbench_q28  1.05  2.9Β±0.02ms? 
?/sec1.00  2.8Β±0.02ms? ?/sec
   physical_plan_clickbench_q29  1.05  3.6Β±0.04ms? 
?/sec1.00  3.5Β±0.02ms? ?/sec
   physical_plan_clickbench_q3   1.03  1943.5Β±29.11Β΅s? 
?/sec1.00  1884.8Β±20.97Β΅s? ?/sec
   physical_plan_clickbench_q30  1.06 15.2Β±0.16ms? 
?/sec1.00 14.3Β±0.12ms? ?/sec
   physical_plan_clickbench_q31  1.04  2.9Β±0.03ms? 
?/sec1.00  2.8Β±0.03ms? ?/sec
   physical_plan_clickbench_q32  1.04  2.9Β±0.02ms? 
?/sec1.00  2.8Β±0.03ms? ?/sec
   physical_plan_clickbench_q33  1.03  2.5Β±0.02ms? 
?/sec1.00  2.4Β±

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-30 Thread via GitHub


alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922755179

   πŸ€–: Benchmark completed
   
   Details
   
   
   
   ```
   Comparing HEAD and reduce-expr-struct-size
   
   Benchmark clickbench_extended.json
   
   ┏━━┳┳━┳━━━┓
   ┃ Query┃   HEAD ┃ reduce-expr-struct-size ┃Change ┃
   ┑━━╇╇━╇━━━┩
   β”‚ QQuery 0 β”‚  1942.63ms β”‚   1966.87ms β”‚ no change β”‚
   β”‚ QQuery 1 β”‚   717.15ms β”‚700.88ms β”‚ no change β”‚
   β”‚ QQuery 2 β”‚  1467.62ms β”‚   1429.88ms β”‚ no change β”‚
   β”‚ QQuery 3 β”‚   695.60ms β”‚685.65ms β”‚ no change β”‚
   β”‚ QQuery 4 β”‚  1462.18ms β”‚   1474.42ms β”‚ no change β”‚
   β”‚ QQuery 5 β”‚ 15721.51ms β”‚  15581.23ms β”‚ no change β”‚
   β”‚ QQuery 6 β”‚  2062.32ms β”‚   2036.21ms β”‚ no change β”‚
   β”‚ QQuery 7 β”‚  2270.27ms β”‚   2083.30ms β”‚ +1.09x faster β”‚
   β”‚ QQuery 8 β”‚   836.18ms β”‚845.40ms β”‚ no change β”‚
   β””β”€β”€β”΄β”΄β”€β”΄β”€β”€β”€β”˜
   ┏┳┓
   ┃ Benchmark Summary  ┃┃
   ┑╇┩
   β”‚ Total Time (HEAD)  β”‚ 27175.46ms β”‚
   β”‚ Total Time (reduce-expr-struct-size)   β”‚ 26803.84ms β”‚
   β”‚ Average Time (HEAD)β”‚  3019.50ms β”‚
   β”‚ Average Time (reduce-expr-struct-size) β”‚  2978.20ms β”‚
   β”‚ Queries Faster β”‚  1 β”‚
   β”‚ Queries Slower β”‚  0 β”‚
   β”‚ Queries with No Change β”‚  8 β”‚
   β””β”΄β”˜
   
   Benchmark clickbench_partitioned.json
   
   ┏━━┳┳━┳━━━┓
   ┃ Query┃   HEAD ┃ reduce-expr-struct-size ┃Change ┃
   ┑━━╇╇━╇━━━┩
   β”‚ QQuery 0 β”‚15.63ms β”‚ 15.00ms β”‚ no change β”‚
   β”‚ QQuery 1 β”‚33.27ms β”‚ 32.47ms β”‚ no change β”‚
   β”‚ QQuery 2 β”‚82.50ms β”‚ 81.25ms β”‚ no change β”‚
   β”‚ QQuery 3 β”‚94.25ms β”‚ 98.21ms β”‚ no change β”‚
   β”‚ QQuery 4 β”‚   587.93ms β”‚571.78ms β”‚ no change β”‚
   β”‚ QQuery 5 β”‚   833.99ms β”‚847.44ms β”‚ no change β”‚
   β”‚ QQuery 6 β”‚24.18ms β”‚ 23.83ms β”‚ no change β”‚
   β”‚ QQuery 7 β”‚39.01ms β”‚ 38.25ms β”‚ no change β”‚
   β”‚ QQuery 8 β”‚   905.00ms β”‚916.35ms β”‚ no change β”‚
   β”‚ QQuery 9 β”‚  1212.46ms β”‚   1222.81ms β”‚ no change β”‚
   β”‚ QQuery 10β”‚   260.41ms β”‚262.26ms β”‚ no change β”‚
   β”‚ QQuery 11β”‚   292.92ms β”‚298.17ms β”‚ no change β”‚
   β”‚ QQuery 12β”‚   895.98ms β”‚925.99ms β”‚ no change β”‚
   β”‚ QQuery 13β”‚  1364.30ms β”‚   1194.60ms β”‚ +1.14x faster β”‚
   β”‚ QQuery 14β”‚   842.70ms β”‚849.81ms β”‚ no change β”‚
   β”‚ QQuery 15β”‚   823.74ms β”‚816.12ms β”‚ no change β”‚
   β”‚ QQuery 16β”‚  1750.44ms β”‚   1737.46ms β”‚ no change β”‚
   β”‚ QQuery 17β”‚  1612.09ms β”‚   1618.81ms β”‚ no change β”‚
   β”‚ QQuery 18β”‚  3077.16ms β”‚   3088.23ms β”‚ no change β”‚
   β”‚ QQuery 19β”‚82.83ms β”‚ 85.59ms β”‚ no change β”‚
   β”‚ QQuery 20β”‚  1191.35ms β”‚   1162.37ms β”‚ no change β”‚
   β”‚ QQuery 21β”‚  1333.85ms β”‚   1333.28ms β”‚ no change β”‚
   β”‚ QQuery 22β”‚  2210.24ms β”‚   2214.84ms β”‚ no change β”‚
   β”‚ QQuery 23β”‚  8055.93ms β”‚   8149.44ms β”‚ no change β”‚
   β”‚ QQuery 24β”‚   475.26ms β”‚473.62ms β”‚ no change β”‚
   β”‚ QQuery 25β”‚   387.85ms β”‚392.97ms β”‚ no change β”‚
   β”‚ QQuery 26β”‚   524.75ms β”‚523.65ms β”‚ no change β”‚
   β”‚ QQuery 27β”‚  1607.14ms β”‚   1627.18ms β”‚ no change β”‚
   β”‚ QQuery 28β”‚ 12685.19ms β”‚  13601.69ms β”‚  1.07x slower β”‚
   β”‚ QQuery 29β”‚   535.98ms β”‚525.12ms β”‚ no change β”‚
   β”‚ QQuery 30β”‚   789.38ms β”‚807.62ms β”‚ no change β”‚
   β”‚ QQuery 31β”‚   851.48ms β”‚851.02ms β”‚ no change β”‚
   β”‚ QQuery 32β”‚  2663.64ms β”‚   2669.92ms β”‚ no change β”‚
   β”‚ QQuery 33β”‚  3352.71ms β”‚   3397.47ms β”‚ no change β”‚
   β”‚ QQuery 34β”‚  3365.36ms β”‚   3385.66ms β”‚ no change β”‚
   β”‚ QQuery 35β”‚  1276.16ms β”‚   1314.40ms β”‚ no change β”‚
   β”‚ QQuery 36β”‚   123.82ms β”‚126.31ms β”‚ 

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-30 Thread via GitHub


alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922659200

   πŸ€– `./gh_compare_branch.sh` [Benchmark 
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
 Running
   Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr  2 16:34:16 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing reduce-expr-struct-size (3ad082c3306884d13c795e7b37c70bf79963d355) 
to 2c2f225926958b6abf06b01fcfb594017531043c 
[diff](https://github.com/apache/datafusion/compare/2c2f225926958b6abf06b01fcfb594017531043c..3ad082c3306884d13c795e7b37c70bf79963d355)
   Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-30 Thread via GitHub


alamb commented on code in PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#discussion_r2116011961


##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Call a window function with a set of arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box),

Review Comment:
   Maybe as a follow on experiment we can try to box Alias and 
OuterReferenceColumn too -- but I recommend not in this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-30 Thread via GitHub


alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922513729

   I kicked off some benchmarks for this PR -- thank you @hendrikmakait πŸ™ 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-30 Thread via GitHub


alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922512365

   πŸ€– `./gh_compare_branch_bench.sh` [Benchmark 
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh)
 Running
   Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr  2 16:34:16 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing reduce-expr-struct-size (3ad082c3306884d13c795e7b37c70bf79963d355) 
to 2c2f225926958b6abf06b01fcfb594017531043c 
[diff](https://github.com/apache/datafusion/compare/2c2f225926958b6abf06b01fcfb594017531043c..3ad082c3306884d13c795e7b37c70bf79963d355)
   BENCH_NAME=sql_planner
   BENCH_COMMAND=cargo bench --bench sql_planner
   BENCH_FILTER=
   BENCH_BRANCH_NAME=reduce-expr-struct-size
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-29 Thread via GitHub


hendrikmakait commented on code in PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#discussion_r2114253916


##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Call a window function with a set of arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box),

Review Comment:
   It looks like `Alias` might be the culprit here (with some padding): 
   
   `size_of::()` is 136, followed by `size_of::<(DataType, Column)>()` 
(alias `OuterReferenceColumn`) with 128.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-29 Thread via GitHub


hendrikmakait commented on code in PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#discussion_r2114223556


##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Call a window function with a set of arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box),

Review Comment:
   Good question! I'm fairly new to Rust; what's a good way to analyze the enum 
variant sizes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-29 Thread via GitHub


Dandandan commented on code in PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#discussion_r2113426682


##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Call a window function with a set of arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box),

Review Comment:
   I wonder, what is now 144 bytes?``



##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Call a window function with a set of arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box),

Review Comment:
   I wonder, what is now 144 bytes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-27 Thread via GitHub


alamb closed pull request #14366: Reduce size of `Expr` struct
URL: https://github.com/apache/datafusion/pull/14366


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-27 Thread via GitHub


alamb commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2913489237

   Filed https://github.com/apache/datafusion/issues/16199 to track


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-26 Thread via GitHub


alamb commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2909160571

   > If these benchmarks are reliable, then this would be a pretty neat 
improvement.
   
   Oh, man I missed the benchmark results (I forgot to check them). 
   
   I'll ressurect this PR over the next week or so and rerun benchmarks
   
   Thank you very much for the shout @crepererum 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-26 Thread via GitHub


crepererum commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2909035054

   If these benchmarks are reliable, then this would be a pretty neat 
improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-16 Thread via GitHub


alamb commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2887593105

   πŸ€–: Benchmark completed
   
   Details
   
   
   
   ```
   group alamb_make_expr_smaller
main
   - ---

   logical_aggregate_with_join   1.00  1486.0Β±16.67Β΅s? 
?/sec1.02  1519.2Β±14.28Β΅s? ?/sec
   logical_select_all_from_1000  1.00  4.5Β±0.02ms? 
?/sec1.01  4.6Β±0.03ms? ?/sec
   logical_select_one_from_700   1.00  1182.9Β±12.15Β΅s? 
?/sec1.03  1213.7Β±19.22Β΅s? ?/sec
   logical_trivial_join_high_numbered_columns1.00  1148.5Β±16.11Β΅s? 
?/sec1.02  1171.8Β±12.08Β΅s? ?/sec
   logical_trivial_join_low_numbered_columns 1.00  1140.3Β±19.70Β΅s? 
?/sec1.02  1164.6Β±17.11Β΅s? ?/sec
   physical_intersection 1.00  2.4Β±0.02ms? 
?/sec1.02  2.5Β±0.02ms? ?/sec
   physical_join_consider_sort   1.00  3.2Β±0.02ms? 
?/sec1.02  3.3Β±0.02ms? ?/sec
   physical_join_distinct1.00  1148.5Β±15.20Β΅s? 
?/sec1.00  1150.0Β±10.96Β΅s? ?/sec
   physical_many_self_joins  1.01 17.5Β±0.12ms? 
?/sec1.00 17.3Β±0.18ms? ?/sec
   physical_plan_clickbench_all  1.00285.4Β±4.98ms? 
?/sec1.14324.2Β±6.52ms? ?/sec
   physical_plan_clickbench_q1   1.00  4.6Β±0.12ms? 
?/sec1.10  5.0Β±0.13ms? ?/sec
   physical_plan_clickbench_q10  1.00  5.7Β±0.14ms? 
?/sec1.16  6.5Β±0.42ms? ?/sec
   physical_plan_clickbench_q11  1.00  5.8Β±0.22ms? 
?/sec1.13  6.6Β±0.21ms? ?/sec
   physical_plan_clickbench_q12  1.00  6.0Β±0.17ms? 
?/sec1.12  6.7Β±0.24ms? ?/sec
   physical_plan_clickbench_q13  1.00  5.5Β±0.11ms? 
?/sec1.12  6.1Β±0.17ms? ?/sec
   physical_plan_clickbench_q14  1.00  5.8Β±0.17ms? 
?/sec1.12  6.5Β±0.15ms? ?/sec
   physical_plan_clickbench_q15  1.00  5.6Β±0.14ms? 
?/sec1.13  6.3Β±0.18ms? ?/sec
   physical_plan_clickbench_q16  1.00  5.1Β±0.18ms? 
?/sec1.09  5.5Β±0.19ms? ?/sec
   physical_plan_clickbench_q17  1.00  5.0Β±0.13ms? 
?/sec1.14  5.8Β±0.15ms? ?/sec
   physical_plan_clickbench_q18  1.00  4.9Β±0.13ms? 
?/sec1.12  5.4Β±0.15ms? ?/sec
   physical_plan_clickbench_q19  1.00  5.7Β±0.16ms? 
?/sec1.12  6.4Β±0.14ms? ?/sec
   physical_plan_clickbench_q2   1.00  4.9Β±0.13ms? 
?/sec1.10  5.4Β±0.12ms? ?/sec
   physical_plan_clickbench_q20  1.00  4.5Β±0.10ms? 
?/sec1.11  5.0Β±0.14ms? ?/sec
   physical_plan_clickbench_q21  1.00  4.9Β±0.15ms? 
?/sec1.13  5.5Β±0.20ms? ?/sec
   physical_plan_clickbench_q22  1.00  5.7Β±0.17ms? 
?/sec1.15  6.6Β±0.17ms? ?/sec
   physical_plan_clickbench_q23  1.00  6.2Β±0.16ms? 
?/sec1.15  7.1Β±0.17ms? ?/sec
   physical_plan_clickbench_q24  1.00  6.9Β±0.18ms? 
?/sec1.12  7.8Β±0.18ms? ?/sec
   physical_plan_clickbench_q25  1.00  5.1Β±0.13ms? 
?/sec1.14  5.8Β±0.12ms? ?/sec
   physical_plan_clickbench_q26  1.00  4.8Β±0.13ms? 
?/sec1.12  5.4Β±0.13ms? ?/sec
   physical_plan_clickbench_q27  1.00  5.2Β±0.16ms? 
?/sec1.11  5.8Β±0.19ms? ?/sec
   physical_plan_clickbench_q28  1.00  5.9Β±0.18ms? 
?/sec1.14  6.7Β±0.20ms? ?/sec
   physical_plan_clickbench_q29  1.00  6.9Β±0.15ms? 
?/sec1.17  8.1Β±0.27ms? ?/sec
   physical_plan_clickbench_q3   1.00  4.9Β±0.11ms? 
?/sec1.10  5.3Β±0.12ms? ?/sec
   physical_plan_clickbench_q30  1.00 19.5Β±0.32ms? 
?/sec1.15 22.3Β±0.25ms? ?/sec
   physical_plan_clickbench_q31  1.00  6.1Β±0.17ms? 
?/sec1.15  7.0Β±0.20ms? ?/sec
   physical_plan_clickbench_q32  1.00  6.0Β±0.14ms? 
?/sec1.15  7.0Β±0.18ms? ?/sec
   physical_plan_clickbench_q33  1.00  5.6Β±0.17ms? 
?/sec1.13  6.3Β±0.17ms? ?/sec
   physical_plan

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-16 Thread via GitHub


alamb commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2887495892

   πŸ€– `./gh_compare_branch_bench.sh` [Benchmark 
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh)
 Running
   Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr  2 16:34:16 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/make_expr_smaller (279f074291ddb0ea01344a012c710aecef044e18) 
to cb5d42e13531f250f702c02fb4d8f5d710e863f5 
[diff](https://github.com/apache/datafusion/compare/cb5d42e13531f250f702c02fb4d8f5d710e863f5..279f074291ddb0ea01344a012c710aecef044e18)
   BENCH_NAME=sql_planner
   BENCH_COMMAND=cargo bench --bench sql_planner
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_make_expr_smaller
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-04-30 Thread via GitHub


github-actions[bot] closed pull request #14366: Reduce size of `Expr` struct
URL: https://github.com/apache/datafusion/pull/14366


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-04-21 Thread via GitHub


github-actions[bot] commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2819872416

   Thank you for your contribution. Unfortunately, this pull request is stale 
because it has been open 60 days with no activity. Please remove the stale 
label or comment or this will be closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-02-19 Thread via GitHub


alamb commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2668783186

   I'll try and find time to run some sql planning benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-02-17 Thread via GitHub


crepererum commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2662587013

   TBH I would be surprised if the build performance is affected by a few 
`memcpy` calls. I think the question here is rather if our runtime performance 
changes, potentially due to less stress on the memory allocator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-02-11 Thread via GitHub


alamb commented on PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2652408678

   I ran some build benchmarks on a GPC machine and I conclude this change does 
not improve the build timings
   
   ```
   Building main
   + rm -rf target
   + cargo build --release --timings --lib --quiet
   
   real4m28.125s
   user36m43.568s
   sys 1m19.993s
   + rm -rf target
   + cargo build --release --timings --lib --quiet
   
   real4m32.743s
   user36m47.328s
   sys 1m19.786s
   + rm -rf target
   + cargo build --release --timings --lib --quiet
   
   real4m33.336s
   user36m50.926s
   sys 1m19.246s
   + git reset --hard
   HEAD is now at cb5d42e135 Disable extended tests (#14604)
   + git checkout alamb/make_expr_smaller
   Switched to branch 'alamb/make_expr_smaller'
   Your branch is up to date with 'alamb/alamb/make_expr_smaller'.
   + echo 'Building make_expr_smaller'
   Building make_expr_smaller
   + rm -rf target
   + cargo build --release --timings --lib --quiet
   
   real4m30.234s
   user37m15.027s
   sys 1m19.688s
   + rm -rf target
   + cargo build --release --timings --lib --quiet
   
   real4m30.580s
   user37m6.178s
   sys 1m19.885s
   + rm -rf target
   + cargo build --release --timings --lib --quiet
   
   real4m32.293s
   user36m54.063s
   sys 1m19.557s
   alamb@aal-dev:~/arrow-datafusion$
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-01-30 Thread via GitHub


felipecrv commented on code in PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#discussion_r1935630021


##
datafusion/expr/src/expr.rs:
##
@@ -297,7 +298,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Represents the call of a window function with arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box), // Boxed as it is large (272 bytes)

Review Comment:
   ```suggestion
   WindowFunction(Box), // Boxed as it is large (> 272 
bytes)
   ```



##
datafusion/expr/src/expr.rs:
##
@@ -297,7 +298,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Represents the call of a window function with arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box), // Boxed as it is large (272 bytes)

Review Comment:
   ```suggestion
   WindowFunction(Box), // Boxed as it is large (>= 272 
bytes)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-01-30 Thread via GitHub


crepererum commented on code in PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#discussion_r1935282817


##
datafusion/expr/src/expr.rs:
##
@@ -297,7 +298,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Represents the call of a window function with arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box), // Boxed as it is large (272 bytes)

Review Comment:
   the concrete byte count is likely out-of-date soon, so I would just exclude 
it from the comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-01-29 Thread via GitHub


alamb commented on code in PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#discussion_r1934699879


##
datafusion/expr/src/expr.rs:
##
@@ -297,7 +298,7 @@ pub enum Expr {
 /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
 AggregateFunction(AggregateFunction),
 /// Represents the call of a window function with arguments.
-WindowFunction(WindowFunction),
+WindowFunction(Box), // Boxed as it is large (272 bytes)

Review Comment:
   This is the API change in this PR: it moves a "large" structure (272 bytes) 
on to the heap
   
   the rest of the changes follow from this one change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-01-29 Thread via GitHub


findepi commented on code in PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#discussion_r1934669162


##
datafusion/expr/src/expr.rs:
##
@@ -3067,4 +3078,19 @@ mod test {
 rename: opt_rename,
 }
 }
+
+#[test]
+fn test_size_of_expr() {
+// because Expr is such a widely used struct in DataFusion
+// it is important to keep its size as small as possible
+//
+// If this test fails when you change `Expr`, please try
+// `Box`ing the fields to make `Expr` smaller
+// See https://github.com/apache/datafusion/issues/14256 for details
+assert_eq!(size_of::(), 112);
+assert_eq!(size_of::(), 64);
+assert_eq!(size_of::(), 24); // 3 ptrs

Review Comment:
   does this test pass in debug and release modes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-01-29 Thread via GitHub


findepi commented on code in PR #14366:
URL: https://github.com/apache/datafusion/pull/14366#discussion_r1934670024


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -2420,19 +2420,24 @@ impl Window {
 .iter()
 .enumerate()
 .filter_map(|(idx, expr)| {
-if let Expr::WindowFunction(WindowFunction {
+let Expr::WindowFunction(window_func) = expr else {
+return None;
+};
+let WindowFunction {
 fun: WindowFunctionDefinition::WindowUDF(udwf),
 partition_by,
 ..
-}) = expr
-{
-// When there is no PARTITION BY, row number will be unique
-// across the entire table.
-if udwf.name() == "row_number" && partition_by.is_empty() {
-return Some(idx + input_len);
-}
+} = window_func.as_ref()
+else {
+return None;
+};
+// When there is no PARTITION BY, row number will be unique
+// across the entire table.
+if udwf.name() == "row_number" && partition_by.is_empty() {
+return Some(idx + input_len);
+} else {
+None

Review Comment:
   inconsistent return vs value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]