Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-11-26 Thread via GitHub


alamb commented on issue #16676:
URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3582795991

   I think this issue should now be closed. Please reopen if it you don't see 
any improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-11-26 Thread via GitHub


alamb closed issue #16676: 1000x slowdown opening parquet file due to partitions
URL: https://github.com/apache/datafusion/issues/16676


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-08-14 Thread via GitHub


alamb commented on issue #16676:
URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3188301329

   We have added a new metadata cache (will ship in 50.0.0):
   - https://github.com/apache/datafusion/issues/17000
   
   I wonder if you could try your reproducer with what is on main and see if 
the solution gets better


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-07-08 Thread via GitHub


asayers commented on issue #16676:
URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3048396188

   I can't share the data I was hitting this case with.  I could try to make a 
synthetic reproducer, but work is very busy right now so I might not get to it 
for a few months.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-07-07 Thread via GitHub


alamb commented on issue #16676:
URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3044531341

   Perhaps this is related to the default value of statistics too:
   
   https://github.com/apache/datafusion/pull/16447
   
   Do you have a reproducer you can share @asayers ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-07-06 Thread via GitHub


asayers commented on issue #16676:
URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3041069977

   I _think_ the correct solution was for me to enable the metadata cache (I 
haven't confirmed this).  So perhaps the "bug" (if there is one) is just that 
the metadata cache is off by default?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-07-05 Thread via GitHub


jatin510 commented on issue #16676:
URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3039286131

   take


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-07-03 Thread via GitHub


jatin510 commented on issue #16676:
URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3034536746

   take
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]