Multi-level partitions question

Boris Tyukin Thu, 11 Oct 2018 12:14:38 -0700

Hi guys,
Read this doc
https://kudu.apache.org/docs/schema_design.html#multilevel-partitioning
and I have a question on this particular statement
"Scans on multilevel partitioned tables can take advantage of partition
pruning on any of the levels independently"


Does it mean, that both strategies below would be equivalent in terms of
performance (i.e. minimum scans)

partition by hash(shop_id), hash(customer_id)
vs.
partition by hash(customer_id), hash(shop_id)

60% of the queries are using both shop_id and customer_id but 40% of
queries need to pull all customers for a specific shop_id. And almost never
by customer_id alone (customer_id is not unique across shops and is
assigned per shop).

At the same time, if I partition by customer_id first,  partitions will be
distributed more evenly.

Thanks!
Boris

Multi-level partitions question

Reply via email to