This is a little tricky but this is how it works:

SELECT buyer_id, item_id, rank(item_id), created_time
FROM (
    SELECT buyer_id, item_id, created_time
    FROM testingtable1
    DISTRIBUTE BY buyer_id, item_id
    SORT BY buyer_id, item_id, created_time desc
) a
WHERE rank(item_id) < 10;


On Tue, Jul 10, 2012 at 12:31 PM, Raihan Jamal <jamalrai...@gmail.com> wrote:
> Still it's not working with the use of my rank UDF. Below is the query I am
> using
>
> Can anyone help me, what changes I need to make in my below sql query?
>
>
> CREATE TABLE IF NOT EXISTS TestingTable1
>
>
> (
>
>
> BUYER_ID BIGINT,
>
>
> ITEM_ID BIGINT,
>
>
> CREATED_TIME STRING
> )
>
>
> Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for
> yesterday's date by sorting the created_time in descending order.
>
>
> So what I was thinking is that, with the use of this below query, I will be
> getting-
>
>
> SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC;
>
>
> All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in Descending
> order, But how I can pick TOP 10 for each BUYER_ID.?
>
> This is my SQL Query that I am using with the use of rank UDF.
>
> SELECT buyer_id, item_id, created_time, rk
> FROM (
>     SELECT buyer_id, item_id, rank(item_id) as rk, created_time
>     FROM testingtable1
>     DISTRIBUTE BY buyer_id, item_id
>     SORT BY buyer_id, item_id, created_time desc
> ) a
> WHERE rk < 10
> ORDER BY buyer_id, created_time, rk;
>
>
>
>
>
> Raihan Jamal
>
>
>
> On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst <jasper.knu...@incentro.com>
> wrote:
>>
>> Hi Raihan,
>>
>> You should use 'rank(buyer_id)' in the order by clause on line 9 in stead
>> of the alias 'rk'. I had the same problem, strangely, the alias is not
>> resolved when it is in the order by clause.
>>
>> Other thing, I had some issues when I used this exact same set up for
>> ranking results, that the rank UDF was already implemented at the map phase.
>> Then you get very strange results. You have to introduce an extra subquery
>> for it to work.
>>
>> Jasper
>>
>> Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <jamalrai...@gmail.com> het
>> volgende:
>>
>>> I was not able to understand, This is my below qiuery that I am using
>>> currently-
>>>
>>> SELECT buyer_id, item_id, created_time
>>> FROM (
>>>     SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
>>>     FROM testingtable1
>>>     DISTRIBUTE BY buyer_id, item_id
>>>     SORT BY buyer_id, item_id, created_time desc
>>> ) a
>>> WHERE rank < 10
>>> ORDER BY buyer_id, created_time, rank;
>>>
>>>
>>> What changes I need to make?
>>>
>>>
>>>
>>>
>>> Raihan Jamal
>>>
>>>
>>>
>>> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <nitinpawar...@gmail.com>
>>> wrote:
>>>>
>>>> try rk in upper select statement as well
>>>>
>>>>
>>>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <jamalrai...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Thanks for commenting out. Yes I figured that out, its a UDF. So now I
>>>>> have created a new UDF Rank and added to classpath also. But when I am 
>>>>> again
>>>>> running the below query-
>>>>>
>>>>> SELECT buyer_id, item_id, created_time
>>>>> FROM (
>>>>>     SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time
>>>>>     FROM testingtable1
>>>>>     DISTRIBUTE BY buyer_id, item_id
>>>>>     SORT BY buyer_id, item_id, created_time desc
>>>>> ) a
>>>>> WHERE rk < 10
>>>>> ORDER BY buyer_id, rk;
>>>>>
>>>>> I am getting this below error-
>>>>>
>>>>> FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or
>>>>> Column Reference rk
>>>>>
>>>>>
>>>>> Why is it so? Any suggestions?
>>>>>
>>>>>
>>>>> Raihan Jamal
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <tec...@gmail.com> wrote:
>>>>>>
>>>>>> hive has no built-in rank function. you'd need to use a user-defined
>>>>>> function (UDF) to simulate it. there are a few custom implementations
>>>>>> on the net that you can leverage.
>>>>>>
>>>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <jamalrai...@gmail.com>
>>>>>> wrote:
>>>>>> > What's wrong with the below query.
>>>>>> >
>>>>>> >
>>>>>> > SELECT buyer_id, item_id, created_time
>>>>>> > FROM (
>>>>>> >     SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
>>>>>> >     FROM testingtable1
>>>>>> >     DISTRIBUTE BY buyer_id, item_id
>>>>>> >     SORT BY buyer_id, item_id, created_time desc
>>>>>> > ) a
>>>>>> > WHERE rank < 10
>>>>>> > ORDER BY buyer_id, rank
>>>>>> >
>>>>>> >
>>>>>> > I am always getting as -
>>>>>> >
>>>>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank
>>>>>> >
>>>>>> >
>>>>>> > I am using Hive 0.6 I guess.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Raihan Jamal
>>>>>> >
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>

Reply via email to