On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
Is there any possibility there is a bug in Ritesh Agrawal's query statement
from that article? I created a test table with the exact column names from the
example in the article and used a minimally altered version of the command (I
removed the where clause to simplify things a bit) and got an error which
suggests there is something slightly wrong with the command (or perhaps the
table has to be configured a special way). Here's what I get when I almost
perfectly duplicate that example:
hive> describe test;
OK
user string
category string
value int
Time taken: 0.082 seconds
==================================================
hive> select * from test;
OK
user1 cat1 1
user1 cat1 2
user1 cat1 3
user1 cat2 10
user1 cat2 20
user1 cat2 30
user2 cat1 11
user2 cat1 21
user2 cat1 31
user2 cat2 5
user2 cat2 6
user2 cat2 7
Time taken: 0.202 seconds
==================================================
hive> SELECT user, category, value
> FROM (
> SELECT user, category, rank(user) as rank, value
> FROM test
> DISTRIBUTE BY user
> SORT BY user, value desc
> ) a
> WHERE rank > 1
> ORDER BY user, rank;
FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column
reference 'rank': (possible column names are: _col0, _col1, _col2)
hive>
________________________________________________________________________________
Keith Wiley [email protected] keithwiley.com music.keithwiley.com
"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
-- Keith Wiley
________________________________________________________________________________