Re: SQL help

Gesli, Nicole Thu, 24 May 2012 15:11:07 -0700

Try this:

SELECT a.a_id, b.path
FROM ( SELECT a_id, MIN(t_timestamp) t_timestamp
       FROM   web_data
       GROUP BY a_id
     ) a JOIN
       web_data b ON ( b.a_id = a.a_id AND b.t_timestamp = a.t_timestamp )

-Nicole

From: Roberto Sanabria <[email protected]<mailto:[email protected]>>
Reply-To: <[email protected]<mailto:[email protected]>>
Date: Thu, 24 May 2012 15:06:29 -0700
To: <[email protected]<mailto:[email protected]>>
Subject: Re: SQL help

"I guess do some kind of group by and store it in intermediate file and run 
another select on it?"

Yes, that is my recommendation.

On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia 
<[email protected]<mailto:[email protected]>> wrote:

On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo 
<[email protected]<mailto:[email protected]>> wrote:
Hive is not SQL 92 compliant or whatever.

https://cwiki.apache.org/Hive/languagemanual.html

in particular you can not do subselects inside the in or the where
clause. Hive usually have other formulations like left semi join that
makes things 'like in' and 'not in' possible.

Thanks. But what I am looking for is to select only those rows that are of 
min(t_timestamp) for a given a_id. What would be the best way? I guess do some 
kind of group by and store it in intermediate file and run another select on it?

Edward
On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia 
<[email protected]<mailto:[email protected]>> wrote:
> I am now trying to do it this way but doesn't work in hive. I think I am
> missing something here, can someone please help?
>
> select a_id from web_data t1 where a_id = (select min(a_id) from web_data t2
> where t2.t_timestamp = t1.t_timestamp)
>
> I get:
>
>
> FAILED: Parse Error: line 1:69 cannot recognize input near 'select' 'min'
> '(' in expression specification
>
>
>
> On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia 
> <[email protected]<mailto:[email protected]>>
> wrote:
>>
>> I am new to Hive. I have several SQL from RDBMS database that I need to
>> convert to hive. What's the best reference for HIVEQL? For now I am trying
>> to figure out how to do this in hive:
>>
>> Select  distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY
>> A_ID ORDER BY t_timestamp) From WEB_DATA
>>
>> Any help would be appreciated.
>
>
>

Re: SQL help

Reply via email to