Re: How to run big queries in optimized way ?

MiaoMiao Thu, 20 Sep 2012 19:41:45 -0700

Hive implements a format named RCFILE, which could gain better
performance, but in my project, it just ties with the plain-text
format.


Hive also have an index feature, but not so convenient or practical.

I think the best way to optimized is still reusing the same source
tables, avoiding sub-queries, and merge HiveQL as many as possible.
On Fri, Sep 21, 2012 at 10:30 AM, Mapred Learn <[email protected]> wrote:
> Hi,
> We have datasets which are about 10-15 TB in size.
>
> We want to run hive queries on top of this input data.
>
> What are ways to reduce stress on our cluster for running many such big 
> queries( include joins too) in parallel ?
> How to enable compression etc for intermediate hive output ?
> How to make job cache does not go to high etc ?
> In short , best practices for huge queries on hive ?
>
> Any inputs are really appreciated !
>
> Thanks,
> JJ
>
> Sent from my iPhone

Re: How to run big queries in optimized way ?

Reply via email to