Hi guys,
Read this doc
https://kudu.apache.org/docs/schema_design.html#multilevel-partitioning
and I have a question on this particular statement
"Scans on multilevel partitioned tables can take advantage of partition
pruning on any of the levels independently"

Does it mean, that both strategies below would be equivalent in terms of
performance (i.e. minimum scans)

partition by hash(shop_id), hash(customer_id)
vs.
partition by hash(customer_id), hash(shop_id)

60% of the queries are using both shop_id and customer_id but 40% of
queries need to pull all customers for a specific shop_id. And almost never
by customer_id alone (customer_id is not unique across shops and is
assigned per shop).

At the same time, if I partition by customer_id first,  partitions will be
distributed more evenly.

Thanks!
Boris

Reply via email to