Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]
akurmustafa commented on issue #15665: URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2803679589 Thanks @lalaorya . Maybe it is worth to - check whether .`output_partitioning` flag is implemented for your source operator. - check whether `EnforceDistribution` rule works after you construct your query (initial plan). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]
lalaorya commented on issue #15665: URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2803511123 Thank you @akurmustafa and @qstommyshu for your responses! I tried setting `datafusion.execution.target_partitions`to 1 as suggested, but unfortunately this didn't solve the issue - the error still persists. Regarding reproduction steps, I'm sorry that I can't provide complete reproduction code. Our system is built on DataFusion with many custom implementations Interestingly, I also cannot reproduce this issue in my local macOS environment - it only occurs in our production Linux environment. Since I'm unable to provide reproduction steps or additional information that would help investigate this issue further, I'll close this issue temporarily. However, I plan to investigate further by reviewing the DataFusion source code to understand how GlobalLimitExec handles partitioning. If I discover the root cause or find a solution, I'll share my findings in this issue for future reference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]
lalaorya closed issue #15665: GlobalLimitExec execution offset pagination query results in internal error URL: https://github.com/apache/datafusion/issues/15665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]
qstommyshu commented on issue #15665: URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2801870297 Hi @lalaorya , I can't reproduce the error on DataFusion CLI v46.0.1: https://github.com/user-attachments/assets/a4f6af23-8e8d-4275-9da9-df53c8aeaa9f"; /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]
akurmustafa commented on issue #15665: URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2800060351 Hi @lalaorya I didn't reproduce the plans you generated locally, so my thoughts might be wrong or misleading. However, here are my thoughts regarding your problem: > What "GlobalLimitExec requires a single input partition" means in this context? `GlobalLimitExec` operation requires single partititon at its input for sucessful operation. However, in your failied plan its input has 2 partitions (1 comes from the source `ParquetExec` and other comes from the source `MemoryExec`). And after `UnionExec` this results in total of 2 output partitions. Hence, the plan is wrong and cannot be executed in `Datafusion`. I would expect `Datafusion` to not produce this plan. Also, even if your first query doesn't fail I think it is still wrong. That plan would produce more than 40 rows once it is executed (At least this is what I expect). > Is there a specific pattern I should follow when using LIMIT with an offset in DataFusion? No, your query is correct. In this case, generated plan is wrong and it probably stems from an optimization bug. > Are there any configuration settings or query modifications that could help resolve this issue? I think, setting `datafusion.execution.target_partitions`to `1` might help. This way your plan won't be multiple partitions and will be correct. If you can give full reproducer (such as creation of tbl data source), this would help to reproduce bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org