Hello.
I'm running Hadoop 2.2.0 and Hive 0.12.0.
I have an ORC table partitioned by 'range', and sorted by 'time'. I want
to select the max(time) value from a table for a given set of partitions.
I begin with a query that looks like the following:
select max(time) from my_table where range > 1
yeah. That traceback pretty much spells it out - its metastore related and
that's where the partitions are stored.
I'm with the others on this. HiveServer2 is still a little jankey on memory
management. I bounce mine once a day at midnight just to play it safe (and
because i can.)
Again, for me,
Thanks all for the quick feedback.
I'm a bit surprised to learn 15k is considered too much, but we can work
around it. I guess I'm also curious why the query planner needs to know
about all partitions even in the case of simple select/limit queries, where
the query might target only a single part
Dont make tbales with that many partitions. It is an anti pattern. I hwve
tables with 2000 partitions a day and that is rewlly to many. Hive needs go
load that informqtion into memory to plan the query.
On Saturday, February 22, 2014, Terje Marthinussen
wrote:
> Query optimizer in hive is awful o
Query optimizer in hive is awful on memory consumption. 15k partitions sounds a
bit early for it to fail though..
What is your heap size?
Regards,
Terje
> On 22 Feb 2014, at 12:05, Norbert Burger wrote:
>
> Hi folks,
>
> We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.
>