Hi Xiaobin,
There is functionality like (from: 
https://cwiki.apache.org/Hive/languagemanual-dml.html):
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...

This will do one pass through your table's partitions but group and filter them 
according to different criteria. As you become more proficient with Hive, you 
could consider creating an index on some columns or bucketing on those columns 
(for optimization in joins, for example) if you end up 
filtering/grouping/joining on them over and over again.

Good luck!

Mark Grover, Business Intelligence Analyst
OANDA Corporation

www: oanda.com www: fxtrade.com
e: mgro...@oanda.com

"Best Trading Platform" - World Finance's Forex Awards 2009.
"The One to Watch" - Treasury Today's Adam Smith Awards 2009.


----- Original Message -----
From: "Martin Kuhn" <martin.k...@affinitas.de>
To: user@hive.apache.org
Sent: Thursday, January 12, 2012 9:20:00 AM
Subject: Re: how to avoid scan the same table multi times?

Hi there,

> Select count(*), count(distinct u), type from t group by type where plat=1 
> and dt=”2012-1-12-02”
> Select count(*), count(distinct u), type from t where (type =2 or type =6) 
> and dt=”2012-1-12-02” group by type;

> Is there a better way to do these queries?

You could try something like this:

SELECT
    type
  , count(*)
  , count(DISTINCT u)
  , count(CASE WHEN plat=1 THEN u ELSE NULL)
  , count(DISTINCT CASE WHEN plat=1 THEN u ELSE NULL)
  , count(CASE WHEN (type=2 OR type=6) THEN u ELSE NULL)
  , count(DISTINCT CASE WHEN (type=2 OR type=6) THEN u ELSE NULL)
FROM
    t
WHERE
    dt in ("2012-1-12-02", "2012-1-12-03")
GROUP BY
    type
ORDER BY
    type
;

Good luck :)
Martin Kuhn


P.S.  You'ge got a strange date format there. For sorting purposes it would be 
more appropriate to use something like "2012-01-12-02".

Reply via email to