> Ok we hope that partitioning improves performance where the predicate is
>on partitioned columns
 

Nope.

Partitioning *only* improves performance if your queries run with

set hive.mapred.mode=strict;

That's the "use strict" easy way to make sure you're writing good queries.

Even then, schema design in hive is something you need to learn with the
assumption that neither the storage layer, nor the compute layer is part
of "hive".

It floats itself in an "access" layer above both. Not sure there's any
legacy tech to draw parallels with that.

If you haven't seen this before, here's an example of the problem

http://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-tren
ches/24


Cheers,
Gopal
 


Reply via email to