Can anyone help me with this? I have tried other options by tweaking the query also. I am not able to achieve my expected output.
On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky <comptechge...@gmail.com>wrote: > Modified Query that I wrote and its not working as expected output is. > > * > * > *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)* > *FROM (* > * SELECT bid, pid, time* > * FROM table1* > * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) = > '2012-07-09'* > * DISTRIBUTE BY bid,pid,time* > * SORT BY bid, time desc* > *) a* > *WHERE rank(bid) < 3;* > > > > > > On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky > <comptechge...@gmail.com>wrote: > >> I wrote this query after modifying it- >> >> *SELECT buyer_id, item_id, rank(buyer_id), created_time, >> UNIX_TIMESTAMP(created_time)* >> *FROM (* >> * SELECT buyer_id, item_id, created_time* >> * FROM testingtable1* >> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as >> int))) = '2012-07-09'* >> * DISTRIBUTE BY buyer_id,item_id* >> * SORT BY buyer_id, created_time desc* >> *) a* >> *WHERE rank(buyer_id) < 3;* >> >> And the output I got is which is sligtly wrong as it is missing two rows- >> >> *1345653 330760137950 2012-07-09 21:43:29* >> *1345653 330760137950 2012-07-09 21:42:29* >> *1345653 330760137950 2012-07-09 21:41:29* >> >> These two rows are missing- >> >> 1345653 110909316904 2012-07-09 21:29:06 >> 1345653 221065796761 2012-07-09 19:31:48 >> >> So full output should be like this- >> >> *1345653 330760137950 2012-07-09 21:43:29* >> *1345653 330760137950 2012-07-09 21:42:29* >> *1345653 330760137950 2012-07-09 21:41:29* >> 1345653 110909316904 2012-07-09 21:29:06 >> 1345653 221065796761 2012-07-09 19:31:48 >> >> >> On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky >> <comptechge...@gmail.com>wrote: >> >>> Can you show me the exact query that I need to do for this particular >>> problem consideing my scenario? It will be of great help to me. As I am new >>> to HiveQL. >>> >>> I need TOP 3 for those if BID and PID gets matched but with different >>> timestamp. >>> >>> >>> >>> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans < >>> philip.j.trom...@gmail.com> wrote: >>> >>>> Your rank() is being evaluated map side. Put your distribute by and >>>> sort by in an inner query, and then evaluate your rank() in an outer query. >>>> >>>> Phil. >>>> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechge...@gmail.com> >>>> wrote: >>>> >>>>> This is the below data in my Table1 >>>>> >>>>> >>>>> BID PID TIME >>>>> >>>>> --------------+-------------------------+-------------------------------- >>>>> 1345653 330760137950 2012-07-09 21:42:29 >>>>> 1345653 330760137950 2012-07-09 21:43:29 >>>>> 1345653 330760137950 2012-07-09 21:40:29 >>>>> 1345653 330760137950 2012-07-09 21:41:29 >>>>> 1345653 110909316904 2012-07-09 21:29:06 >>>>> 1345653 221065796761 2012-07-09 19:31:48 >>>>> >>>>> So If I need to clarify the above scenario- I have data in above table >>>>> like this- >>>>> For USER *`1345653` *I have this PID `*330760137950` *four times but >>>>> with different timestamps in red color. So I need the output something >>>>> like >>>>> this- >>>>> >>>>> Output that I need:- >>>>> >>>>> *1345653 330760137950 2012-07-09 21:43:29 * >>>>> *1345653 330760137950 2012-07-09 21:42:29 * >>>>> *1345653 330760137950 2012-07-09 21:41:29* >>>>> 1345653 110909316904 2012-07-09 21:29:06 >>>>> 1345653 221065796761 2012-07-09 19:31:48 >>>>> >>>>> So Basically If BID and PID are same but with different timestamps, >>>>> then I need TOP 3 sorted with TIME in descending order >>>>> >>>>> And for this I created rank UDF (User Defined Function). And I wrote >>>>> the below query but its not working for me. Can anyone help me on this? >>>>> * >>>>> * >>>>> * >>>>> * >>>>> *SELECT buyer_id, item_id, created_time* >>>>> * FROM table1* >>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as >>>>> int))) = '2012-07-09'* >>>>> * DISTRIBUTE BY buyer_id* >>>>> * SORT BY buyer_id, created_time desc* >>>>> >>>>> >>> >> >