ORC queries inefficient for sorted field

2014-02-22 Thread Bryan Jeffrey
Hello. I'm running Hadoop 2.2.0 and Hive 0.12.0. I have an ORC table partitioned by 'range', and sorted by 'time'. I want to select the max(time) value from a table for a given set of partitions. I begin with a query that looks like the following: select max(time) from my_table where range > 1

Re: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-22 Thread Stephen Sprague
yeah. That traceback pretty much spells it out - its metastore related and that's where the partitions are stored. I'm with the others on this. HiveServer2 is still a little jankey on memory management. I bounce mine once a day at midnight just to play it safe (and because i can.) Again, for me,

Re: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-22 Thread Norbert Burger
Thanks all for the quick feedback. I'm a bit surprised to learn 15k is considered too much, but we can work around it. I guess I'm also curious why the query planner needs to know about all partitions even in the case of simple select/limit queries, where the query might target only a single part

Re: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-22 Thread Edward Capriolo
Dont make tbales with that many partitions. It is an anti pattern. I hwve tables with 2000 partitions a day and that is rewlly to many. Hive needs go load that informqtion into memory to plan the query. On Saturday, February 22, 2014, Terje Marthinussen wrote: > Query optimizer in hive is awful o

Re: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-22 Thread Terje Marthinussen
Query optimizer in hive is awful on memory consumption. 15k partitions sounds a bit early for it to fail though.. What is your heap size? Regards, Terje > On 22 Feb 2014, at 12:05, Norbert Burger wrote: > > Hi folks, > > We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore. >