> Ok we hope that partitioning improves performance where the predicate is >on partitioned columns
Nope. Partitioning *only* improves performance if your queries run with set hive.mapred.mode=strict; That's the "use strict" easy way to make sure you're writing good queries. Even then, schema design in hive is something you need to learn with the assumption that neither the storage layer, nor the compute layer is part of "hive". It floats itself in an "access" layer above both. Not sure there's any legacy tech to draw parallels with that. If you haven't seen this before, here's an example of the problem http://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-tren ches/24 Cheers, Gopal