Raihan, Andes suggests you use 'limit' after 'desc' ie Hive with sort your query results in descending order and then chop off any more than first 10 records. However, Hive will still run the full scan on data since it has to find the sorted list of records for you.
On Tue, Jul 10, 2012 at 8:36 AM, Raihan Jamal <[email protected]> wrote: > This is my below requirement. I need- *Find `TOP 10` data for each > `BUYER_ID, *So I cannot use LIMIT 10 here in this case. > > This is the below table > > CREATE TABLE IF NOT EXISTS TestingTable1 > ( > BUYER_ID BIGINT, > ITEM_ID BIGINT, > CREATED_TIME STRING > ) > > And this is the below data in the above table- > > BUYER_ID | ITEM_ID | CREATED_TIME > ------------+------------------+----------------------- > 1015826235 220003038067 2012-07-09 19:40:21, > 1015826235 300003861266 2012-07-09 18:19:59, > 1015826235 140002997245 2012-07-09 09:23:17, > 1015826235 210002448035 2012-07-09 22:21:11, > 1015826235 260003553381 2012-07-09 07:09:56, > 1015826235 260003553382 2012-07-09 19:40:39, > 1015826235 260003553383 2012-07-09 06:58:47, > 1015826235 260003553384 2012-07-09 07:28:47, > 1015826235 260003553385 2012-07-09 08:48:47, > 1015826235 260003553386 2012-07-09 06:38:47, > 1015826235 260003553387 2012-07-09 05:38:47, > 1015826235 260003553388 2012-07-09 04:55:47, > 1015826235 260003553389 2012-07-09 06:54:37, > 34512201 597245693 2012-07-09 16:20:21, > 34512201 8071787728 2012-07-09 15:19:59, > 34512201 5868222883 2012-07-09 08:23:17, > 34512201 2412180494 2012-07-09 22:21:11, > 34512201 2422054205 2012-07-09 06:09:56, > 34512201 1875744030 2012-07-09 19:40:39, > 34512201 5639158173 2012-07-09 06:58:47, > 34512201 5656232360 2012-07-09 07:28:47, > 34512201 959188449 2012-07-09 08:48:47, > 34512201 4645350592 2012-07-09 06:38:47, > 34512201 5657320532 2012-07-09 05:38:47, > 34512201 290419656539 2012-07-09 04:55:47, > > If you see the above data in the table, there are only two UNIQUE > `BUYER_ID` and corresponding to those I have `ITEM_ID` AND `CREATED_TIME`. > I need only 10 latest record basis on the time for that given day for each > `BUYER_ID`. > > So for this `BUYER_ID` - `34512201` I need 10 latest record basis on > `CREATED_TIME` for that given day only, it means for today's date I need 10 > latest record for each `BUYER_ID`. > > And each `BUYER_ID` can have any day's data. But I am specifically > interested for day before today's data(means yesterday's date always) by > checking at the `CREATED_TIME` > > **Find `TOP 10` data for each `BUYER_ID`. Below is the sample output.** > > Sample Output. > > BUYER_ID | ITEM_ID | CREATED_TIME > ------------+------------------+----------------------- > 34512201 2412180494 2012-07-09 22:21:11 > 34512201 1875744030 2012-07-09 19:40:39 > 34512201 597245693 2012-07-09 16:20:21 > 34512201 8071787728 2012-07-09 15:19:59 > 34512201 959188449 2012-07-09 08:48:47 > 34512201 5868222883 2012-07-09 08:23:17 > 34512201 5656232360 2012-07-09 07:28:47 > 34512201 5639158173 2012-07-09 06:58:47 > 34512201 4645350592 2012-07-09 06:38:47 > 34512201 2422054205 2012-07-09 06:09:56 > 1015826235 210002448035 2012-07-09 22:21:11 > 1015826235 260003553382 2012-07-09 19:40:39 > 1015826235 220003038067 2012-07-09 19:40:21 > 1015826235 300003861266 2012-07-09 18:19:59 > 1015826235 140002997245 2012-07-09 09:23:17 > 1015826235 260003553385 2012-07-09 08:48:47 > 1015826235 260003553384 2012-07-09 07:28:47 > 1015826235 260003553381 2012-07-09 07:09:56 > 1015826235 260003553383 2012-07-09 06:58:47 > 1015826235 260003553389 2012-07-09 06:54:37 > > > *Raihan Jamal* > > > > On Mon, Jul 9, 2012 at 7:56 PM, Andes <[email protected]> wrote: > >> ** >> hello, you can use "desc" and "limit 10" to filter the top 10. >> >> 2012-07-10 >> ------------------------------ >> ** >> Best Regards >> Andes >> >> ** >> ------------------------------ >> *发件人:*Raihan Jamal >> *发送时间:*2012-07-10 10:31 >> *主题:*Find TOP 10 using HiveQL >> *收件人:*"user"<[email protected]> >> *抄送:* >> >> When I run this query, >> >> SELECT TOP 10 FROM TestingTable1 WHERE ORDER BY buyer_id, created_time >> DESC; >> >> >> I always get error as- >> >> *FAILED: Parse Error: line 1:7 cannot recognize input 'TOP' in select >> expression* >> >> Is there any way around to use TOP 10 or something similar that will work >> in HiveQL? >> >> >> >> *Raihan Jamal* >> >> >
