Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
ahmed-mez closed pull request #16506: fix: extend recursive protection to prevent stack overflows in additional functions URL: https://github.com/apache/datafusion/pull/16506 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
ahmed-mez commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-3073443155 Thank you @alamb ! Opened https://github.com/apache/datafusion/issues/16788 & https://github.com/apache/datafusion/pull/16787 and will close this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
alamb commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-3025182224 Thank you for working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
alamb commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-3025182092 > Perhaps I can close this PR and open an issue so the community can participate in the effort? I think that sounds like a great idea to me > Do we want to commit the reproducer test case only and mark it ignored (`#[ignore]`) ? That would also be a good idea in my opinion, with a link to the ticket / issue that was filed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
ahmed-mez commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-3019500939 Apologies for the delay. I added a commit with a [reproducer test case](https://github.com/apache/datafusion/pull/16506/commits/1a9584b64349acfea70b6e82dbaabbdeaa625758) as requested. It passes when the `recursive_protection` feature is enabled. The reproducer committed is inspired by an original reproducer I mentioned in the PR description but it's a smaller and simpler plan, without any custom Substrait extensions. Correct me if I'm wrong @alamb: fixing these stack overflows by restructuring the code seems like a massive amount of work as it'd involve rewriting the 30+ recursive functions that the current PR tried to protect with `recursive`. And we can't be 100% sure it'd solve all the stack overflows. Is my understanding of the scope correct? If so, what's the preferred path forward? Perhaps I can close this PR and open an issue so the community can participate in the effort? Do we want to commit the reproducer test case only and mark it ignored (`#[ignore]`) ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
ahmed-mez commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-317936 Thank you @alamb for reviewing and for running the benchmark. I will work on adding a test as a first step, that would also help us iterate and validate any changes later on. In general, performance vs reliability trade-offs depend heavily on the specific context and use cases. In my opinion, giving users the ability to make that decision themselves is a strong and flexible approach. In this case, the reliability concern isn’t minor. Avoiding stack overflows is critical, as they can bring down the entire process. Personally, I would gladly accept even more than a 1–2% slowdown in planning time to gain protection against such failures. I’d really appreciate it if DataFusion provided a "knob" that allowed users to express this kind of preference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
alamb commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-3001108899 > I’d really appreciate it if DataFusion provided a "knob" that allowed users to express this kind of preference. Yeah this makes sense to me to I was thinking sometimes we don't have to choose between "panic" and slower planning -- by restructuring the code sometimes we can do much better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
alamb commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-2999628483 🤔 the benchmarks seem to imply this has some non trivial overhead (slows down planning by 1-2%). I wonder if there is any way to restructure the affected codepaths to avoid the recursion (change them into an iterative version, for example, or `Box` more structures to reduce the stack size?) Also, while I appreciate you can't provide the exact reproducer, I think it is important that we have some tests for this change, otherwise it is likely that over time as the code is refactored we'll either lose the annotations or the paths will be different and not annotated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
alamb commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-2997163119 🤖: Benchmark completed Details ``` group ahmed_recursion-protection main - -- logical_aggregate_with_join 1.01632.1±3.78µs? ?/sec1.00627.7±2.76µs? ?/sec logical_select_all_from_1000 1.02 11.2±0.03ms? ?/sec1.00 11.1±0.03ms? ?/sec logical_select_one_from_700 1.00414.0±1.70µs? ?/sec1.01417.4±7.10µs? ?/sec logical_trivial_join_high_numbered_columns1.00374.8±1.41µs? ?/sec1.00373.5±2.43µs? ?/sec logical_trivial_join_low_numbered_columns 1.01362.1±2.71µs? ?/sec1.00358.7±1.67µs? ?/sec physical_intersection 1.00837.1±3.59µs? ?/sec1.00835.1±6.04µs? ?/sec physical_join_consider_sort 1.01 1390.9±8.24µs? ?/sec1.00 1383.5±3.64µs? ?/sec physical_join_distinct1.00352.1±2.40µs? ?/sec1.00350.7±1.16µs? ?/sec physical_many_self_joins 1.00 10.3±0.05ms? ?/sec1.00 10.3±0.05ms? ?/sec physical_plan_clickbench_all 1.01187.7±0.97ms? ?/sec1.00186.6±1.20ms? ?/sec physical_plan_clickbench_q1 1.02 2.5±0.02ms? ?/sec1.00 2.5±0.02ms? ?/sec physical_plan_clickbench_q10 1.01 3.3±0.02ms? ?/sec1.00 3.3±0.03ms? ?/sec physical_plan_clickbench_q11 1.01 3.5±0.03ms? ?/sec1.00 3.5±0.04ms? ?/sec physical_plan_clickbench_q12 1.01 3.7±0.02ms? ?/sec1.00 3.6±0.04ms? ?/sec physical_plan_clickbench_q13 1.02 3.3±0.04ms? ?/sec1.00 3.3±0.03ms? ?/sec physical_plan_clickbench_q14 1.01 3.5±0.02ms? ?/sec1.00 3.5±0.02ms? ?/sec physical_plan_clickbench_q15 1.01 3.4±0.03ms? ?/sec1.00 3.4±0.03ms? ?/sec physical_plan_clickbench_q16 1.01 3.3±0.02ms? ?/sec1.00 3.2±0.03ms? ?/sec physical_plan_clickbench_q17 1.01 3.4±0.03ms? ?/sec1.00 3.3±0.03ms? ?/sec physical_plan_clickbench_q18 1.01 2.9±0.02ms? ?/sec1.00 2.9±0.02ms? ?/sec physical_plan_clickbench_q19 1.02 3.8±0.05ms? ?/sec1.00 3.8±0.04ms? ?/sec physical_plan_clickbench_q2 1.02 2.9±0.02ms? ?/sec1.00 2.9±0.02ms? ?/sec physical_plan_clickbench_q20 1.01 2.6±0.02ms? ?/sec1.00 2.6±0.02ms? ?/sec physical_plan_clickbench_q21 1.01 2.9±0.02ms? ?/sec1.00 2.9±0.03ms? ?/sec physical_plan_clickbench_q22 1.00 3.5±0.02ms? ?/sec1.00 3.5±0.04ms? ?/sec physical_plan_clickbench_q23 1.01 3.8±0.02ms? ?/sec1.00 3.7±0.02ms? ?/sec physical_plan_clickbench_q24 1.00 4.3±0.03ms? ?/sec1.00 4.3±0.02ms? ?/sec physical_plan_clickbench_q25 1.01 3.0±0.03ms? ?/sec1.00 3.0±0.02ms? ?/sec physical_plan_clickbench_q26 1.00 2.9±0.02ms? ?/sec1.00 2.9±0.06ms? ?/sec physical_plan_clickbench_q27 1.01 3.1±0.04ms? ?/sec1.00 3.0±0.03ms? ?/sec physical_plan_clickbench_q28 1.00 3.8±0.03ms? ?/sec1.00 3.8±0.03ms? ?/sec physical_plan_clickbench_q29 1.00 4.4±0.03ms? ?/sec1.00 4.4±0.03ms? ?/sec physical_plan_clickbench_q3 1.01 2.9±0.02ms? ?/sec1.00 2.8±0.02ms? ?/sec physical_plan_clickbench_q30 1.00 12.6±0.09ms? ?/sec1.00 12.7±0.22ms? ?/sec physical_plan_clickbench_q31 1.01 3.8±0.03ms? ?/sec1.00 3.8±0.03ms? ?/sec physical_plan_clickbench_q32 1.01 3.8±0.04ms? ?/sec1.00 3.8±0.03ms? ?/sec physical_plan_clickbench_q33 1.01 3.3±0.03ms? ?/sec1.00 3.3±0.02ms? ?/sec physical_plan
Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]
alamb commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-2996991443 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing ahmed/recursion-protection (fe61624c7e1f1c34848a059962dae02a62e939d5) to aa1e6dac7c5dfdfb2c0f52282638886bff194a5d [diff](https://github.com/apache/datafusion/compare/aa1e6dac7c5dfdfb2c0f52282638886bff194a5d..fe61624c7e1f1c34848a059962dae02a62e939d5) BENCH_NAME=sql_planner BENCH_COMMAND=cargo bench --bench sql_planner BENCH_FILTER= BENCH_BRANCH_NAME=ahmed_recursion-protection Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
