Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]

2025-04-14 Thread via GitHub


akurmustafa commented on issue #15665:
URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2803679589

   Thanks @lalaorya . Maybe it is worth to 
   - check whether .`output_partitioning` flag is implemented for your source 
operator.
   - check whether `EnforceDistribution` rule works after you construct your 
query (initial plan). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]

2025-04-14 Thread via GitHub


lalaorya commented on issue #15665:
URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2803511123

   Thank you @akurmustafa and @qstommyshu for your responses!
   I tried setting `datafusion.execution.target_partitions`to 1 as suggested, 
but unfortunately this didn't solve the issue - the error still persists.
   
   Regarding reproduction steps, I'm sorry that I can't provide complete 
reproduction code. Our system is built on DataFusion with many custom 
implementations Interestingly, I also cannot reproduce this issue in my local 
macOS environment - it only occurs in our production Linux environment.
   
   Since I'm unable to provide reproduction steps or additional information 
that would help investigate this issue further, I'll close this issue 
temporarily. However, I plan to investigate further by reviewing the DataFusion 
source code to understand how GlobalLimitExec handles partitioning. If I 
discover the root cause or find a solution, I'll share my findings in this 
issue for future reference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]

2025-04-14 Thread via GitHub


lalaorya closed issue #15665: GlobalLimitExec execution offset pagination query 
results in internal error
URL: https://github.com/apache/datafusion/issues/15665


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]

2025-04-14 Thread via GitHub


qstommyshu commented on issue #15665:
URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2801870297

   Hi @lalaorya ,
   
   I can't reproduce the error on DataFusion CLI v46.0.1:
   
   https://github.com/user-attachments/assets/a4f6af23-8e8d-4275-9da9-df53c8aeaa9f";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]

2025-04-13 Thread via GitHub


akurmustafa commented on issue #15665:
URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2800060351

   Hi @lalaorya I didn't reproduce the plans you generated locally, so my 
thoughts might be wrong or misleading. However, here are my thoughts regarding 
your problem:
   > What "GlobalLimitExec requires a single input partition" means in this 
context?
   
   `GlobalLimitExec` operation requires single partititon at its input for 
sucessful operation. However, in your failied plan its input has 2 partitions 
(1 comes from the source `ParquetExec` and other comes from the source 
`MemoryExec`). And after `UnionExec` this results in total of 2 output 
partitions. Hence, the plan is wrong and cannot be executed in `Datafusion`. I 
would expect `Datafusion` to not produce this plan.  Also, even if your first 
query doesn't fail I think it is still wrong. That plan would produce more than 
40 rows once it is executed (At least this is what I expect).
   
   > Is there a specific pattern I should follow when using LIMIT with an 
offset in DataFusion?
   
   No, your query is correct. In this case, generated plan is wrong and it 
probably stems from an optimization bug.
   
   > Are there any configuration settings or query modifications that could 
help resolve this issue?
   
   I think, setting `datafusion.execution.target_partitions`to `1` might help. 
This way your plan won't be multiple partitions and will be correct.
   
   If you can give full reproducer (such as creation of tbl data source), this 
would help to reproduce bug.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org