Re: [Spark Core] Adaptive dynamic partition pruning

Jie Han Fri, 11 Nov 2022 08:21:19 -0800

Hmmm… Sorry, I don’t have an idea. Maybe we can try subquery? I’m not sure 
whether it can work :( . We need help from other members of the community.


> 2022年11月12日 00:10，hajyoussef amine <hajyoussef.am...@gmail.com> 写道：
> 
> Hi Jie,
> Let's suppose we have ((dimension_table Join fact_table1) join fact_table2). 
> In the case where (dimension_table JOIN fact_table1) is small enough, the 
> result ideally can be treated as another dimension table and thus used to 
> prune the fact_table2. I don't find an easy way to implement it though.
> 
> 
> On Fri, Nov 11, 2022 at 4:32 PM Jie Han <tunyu...@gmail.com 
> <mailto:tunyu...@gmail.com>> wrote:
> FYI, 
> https://medium.com/@prabhakaran.electric/spark-3-0-feature-dynamic-partition-pruning-dpp-to-avoid-scanning-irrelevant-data-1a7bbd006a89
>  
> <https://medium.com/@prabhakaran.electric/spark-3-0-feature-dynamic-partition-pruning-dpp-to-avoid-scanning-irrelevant-data-1a7bbd006a89>
> 
> This blog may be helpful. Dynamic pruning often works for star schema 
> queries. So, your fact table is big_table which is used to join the others. 
> So there’s only one subqueryboradcast dynamicpruning plan before big_table’s 
> scan while there’s none for the others.
> 
> I’m not sure that I’m correct. Hope it’s helpful to you.
> 
>> 2022年11月11日 21:43，hajyoussef amine <hajyoussef.am...@gmail.com 
>> <mailto:hajyoussef.am...@gmail.com>> 写道：
>> 
>> SubqueryBroadcast
>

Re: [Spark Core] Adaptive dynamic partition pruning

Reply via email to