This is something that a few of us have run into. I think the bottleneck is
in partition creation calls to the metastore. My work around was HIVE-10385
which optionally removed partition creation in the metastore but this isn't
a solution for everyone. If you don't require actual partitions in the
table but simply partitioned data in hdfs give it a shot. It may be
worthwhile looking into optimizations for this use case.

-Slava

On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:

> Hi All,
>
> I have a table which is partitioned on two columns (customer, date). I'm
> loading some data into the table using a Hive query. The MapReduce job
> completed within a few minutes and needs to "commit" the data to the
> appropriate partitions. There were about 32000 partitions generated. The
> commit phase has been running for almost 16 hours and has not finished yet.
> I've been monitoring jmap, and don't believe it's a memory or gc issue.
> I've also been looking at jstack and not sure why it's so slow. I'm not
> sure what the problem is, but seems to be a Hive performance issue when it
> comes to "highly partitioned" tables.
>
> Any thoughts on this issue would be greatly appreciated.
>
> Thanks in advance,
> Pradeep
>



-- 

Slava Markeyev | Engineering | Upsight

Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>

Reply via email to