Dont make tbales with that many partitions. It is an anti pattern. I hwve
tables with 2000 partitions a day and that is rewlly to many. Hive needs go
load that informqtion into memory to plan the query.

On Saturday, February 22, 2014, Terje Marthinussen <tmarthinus...@gmail.com>
wrote:
> Query optimizer in hive is awful on memory consumption. 15k partitions
sounds a bit early for it to fail though..
>
> What is your heap size?
>
> Regards,
> Terje
>
>> On 22 Feb 2014, at 12:05, Norbert Burger <norbert.bur...@gmail.com>
wrote:
>>
>> Hi folks,
>>
>> We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.
>>
>> In Hive, we have an external table backed by HDFS which has a 3-level
partitioning scheme that currently has 15000+ partitions.
>>
>> Within the last day or so, queries against this table have started
failing.  A simple query which shouldn't take very long at all (select *
from ... limit 10) fails after several minutes with a client OOME.  I get
the same outcome on count(*) queries (which I thought wouldn't send any
data back to the client).  Increasing heap on both client and server JVMs
(via HADOOP_HEAPSIZE) doesn't have any impact.
>>
>> We were only able to work around the client OOMEs by reducing the number
of partitions in the table.
>>
>> Looking at the MySQL querylog, my thought is that the Hive client is
quite busy making requests for partitions that doesn't contribute to the
query.  Has anyone else had similar experience against tables this size?
>>
>> Thanks,
>> Norbert
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Reply via email to