Re: Hive queries consuming 100% cpu

2011-02-03 Thread Vijay
Sorry i should've given more details. The query was limited by a partition range; I just omitted the WHERE clause in the mail. The table is not that big. For each day, there is one gzipped file. The largest file is about 250MB (close to 2GB uncompressed). I did intend to count and that was just to

Re: Hive queries consuming 100% cpu

2011-02-03 Thread Viral Bajaria
Hey Vijay, You can go to the mapred ui, normally it runs on port 50030 of the namenode and see how many map jobs got created for your submitted query. You said that the events table has daily partitions but the example query that you have does not prune the partitions by specifying a WHERE clause

Hive queries consuming 100% cpu

2011-02-03 Thread Vijay
Hi, The simplest of hive queries seem to be consuming 100% cpu. This is with a small 4-node cluster. The machines are pretty beefy (16 cores per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for mapred.child.java.opts, etc). A simple query like "select count(1) from events" where