Try this:
SELECT a.a_id, b.path
FROM ( SELECT a_id, MIN(t_timestamp) t_timestamp
FROM web_data
GROUP BY a_id
) a JOIN
web_data b ON ( b.a_id = a.a_id AND b.t_timestamp = a.t_timestamp )
-Nicole
From: Roberto Sanabria <[email protected]<mailto:[email protected]>>
Reply-To: <[email protected]<mailto:[email protected]>>
Date: Thu, 24 May 2012 15:06:29 -0700
To: <[email protected]<mailto:[email protected]>>
Subject: Re: SQL help
"I guess do some kind of group by and store it in intermediate file and run
another select on it?"
Yes, that is my recommendation.
On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia
<[email protected]<mailto:[email protected]>> wrote:
On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo
<[email protected]<mailto:[email protected]>> wrote:
Hive is not SQL 92 compliant or whatever.
https://cwiki.apache.org/Hive/languagemanual.html
in particular you can not do subselects inside the in or the where
clause. Hive usually have other formulations like left semi join that
makes things 'like in' and 'not in' possible.
Thanks. But what I am looking for is to select only those rows that are of
min(t_timestamp) for a given a_id. What would be the best way? I guess do some
kind of group by and store it in intermediate file and run another select on it?
Edward
On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia
<[email protected]<mailto:[email protected]>> wrote:
> I am now trying to do it this way but doesn't work in hive. I think I am
> missing something here, can someone please help?
>
> select a_id from web_data t1 where a_id = (select min(a_id) from web_data t2
> where t2.t_timestamp = t1.t_timestamp)
>
> I get:
>
>
> FAILED: Parse Error: line 1:69 cannot recognize input near 'select' 'min'
> '(' in expression specification
>
>
>
> On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia
> <[email protected]<mailto:[email protected]>>
> wrote:
>>
>> I am new to Hive. I have several SQL from RDBMS database that I need to
>> convert to hive. What's the best reference for HIVEQL? For now I am trying
>> to figure out how to do this in hive:
>>
>> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY
>> A_ID ORDER BY t_timestamp) From WEB_DATA
>>
>> Any help would be appreciated.
>
>
>