Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]
alamb commented on issue #16676: URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3582795991 I think this issue should now be closed. Please reopen if it you don't see any improvement -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]
alamb closed issue #16676: 1000x slowdown opening parquet file due to partitions URL: https://github.com/apache/datafusion/issues/16676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]
alamb commented on issue #16676: URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3188301329 We have added a new metadata cache (will ship in 50.0.0): - https://github.com/apache/datafusion/issues/17000 I wonder if you could try your reproducer with what is on main and see if the solution gets better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]
asayers commented on issue #16676: URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3048396188 I can't share the data I was hitting this case with. I could try to make a synthetic reproducer, but work is very busy right now so I might not get to it for a few months. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]
alamb commented on issue #16676: URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3044531341 Perhaps this is related to the default value of statistics too: https://github.com/apache/datafusion/pull/16447 Do you have a reproducer you can share @asayers ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]
asayers commented on issue #16676: URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3041069977 I _think_ the correct solution was for me to enable the metadata cache (I haven't confirmed this). So perhaps the "bug" (if there is one) is just that the metadata cache is off by default? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]
jatin510 commented on issue #16676: URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3039286131 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]
jatin510 commented on issue #16676: URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3034536746 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
