Yeah I figure it out, this query will create 3 jobs, it seems to do
sub-queries without any optimization.
SELECT
A.userType
, A.userType_count/B.global_count
FROM
(
SELECT
userType
, COUNT(1) as userType_count
FROM
some_table
GROUP BY
userType
) A
JOIN
(
SELECT
COUNT(1) as global_count
FROM
some_table
) B
ON (1=1);
But if all userType are enumerable, I can get their ratio in one job,
which seems like a more optimized way.
SELECT
SUM(IF(userType='A',1,0))/count(1) as A_ratio
, SUM(IF(userType='B',1,0))/count(1) as B_ratio
, SUM(IF(userType='C',1,0))/count(1) as C_ratio
FROM
some_table;
On Fri, Sep 7, 2012 at 2:59 PM, Bejoy KS <[email protected]> wrote:
> Hi
>
> CROSS JOIN is same as giving JOIN keyword. CROSS JOIN just a new notation in
> later releases of hive. JOIN without ON is same as CROSS JOIN
>
> Regards,
> Bejoy KS
>
> ________________________________
> From: MiaoMiao <[email protected]>
> To: [email protected]
> Sent: Friday, September 7, 2012 11:46 AM
> Subject: Re: How to get percentage of each group?
>
> You mean this one? Hive 0.8 doesn't have CROSS JOIN, so I don't know
> if this query works.
> SELECT
> A.userType
> , A.userType_count/B.global_count
> FROM
> (
> SELECT
> userType
> , COUNT(1) as userType_count
> FROM
> some_table
> GROUP BY
> userType
> ) A
> CROSS JOIN
> (
> SELECT
> COUNT(1) as global_count
> FROM
> some_table
> ) B;
> On Fri, Sep 7, 2012 at 1:34 PM, Bertrand Dechoux <[email protected]> wrote:
>> Hi,
>>
>> You could use a cross join.
>> You basically have one table
>>
>> select
>> userType
>> , count(1)
>> from
>> some_table
>> group by
>> userType
>>
>> and a second one
>>
>> select count(1) from some_table
>>
>> With a cross join you can add the global count to every results in the
>> first
>> table and compute a ratio.
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Fri, Sep 7, 2012 at 7:23 AM, MiaoMiao <[email protected]> wrote:
>>>
>>> I have a table, containing userId and userType.
>>> userId userType
>>> 1 A
>>> 2 B
>>> 3 C
>>> 4 A
>>> 5 B
>>> 6 B
>>>
>>> I want to get percentage of each userType.
>>> My current solution:
>>> 1. Get count of each group via THRIFT
>>> select
>>> userType
>>> , count(1)
>>> from
>>> some_table
>>> group by
>>> userType
>>>
>>> 2. Calculate each userType using other programming language like PHP.
>>>
>>> This solution is fine, but I'm just curious, is there a way to do it
>>> in one query?
>>> I know this query works in mysql, but not hive.
>>> select
>>> userType
>>> , count(1)/(select count(1) from some_table)
>>> from
>>> some_table
>>> group by
>>> userType
>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>
>