Yep, the original article is definitely erroneous in this regard. I figured out that eventually. I'm not sure how much I can trust that resource now. I may have to look elsewhere. I agree that Edward's description is pretty good, but as I said earlier, I can't actually use his code, so I'm trying to cobble a workable solution together from the various resources available. Ritesh's article, despite the error in the Hive syntax, is still useful in that it enables one to quickly compile a simple rank jar without relying on git, maven, or other project dependencies -- problems which have plagued me with Edward's approach. So, if I can use Ritesh's method to write a simple rank function, and Edward's accurate description of how to construct the query, then I can put all the pieces together into a workable solution.
I'll let you know if I get it. On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote: > You are getting the error because you are ORDERing BY rank but rank is not in > the top SELECT > > Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a > subquery: > SELECT ..., rank(user) > FROM (SELECT ... DISTRIBUTE BY ... SORT BY) > > igor > decide.com > > > On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley <kwi...@keithwiley.com> wrote: > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > Is there any possibility there is a bug in Ritesh Agrawal's query statement > from that article? I created a test table with the exact column names from > the example in the article and used a minimally altered version of the > command (I removed the where clause to simplify things a bit) and got an > error which suggests there is something slightly wrong with the command (or > perhaps the table has to be configured a special way). Here's what I get > when I almost perfectly duplicate that example: > > hive> describe test; > OK > user string > category string > value int > Time taken: 0.082 seconds > ================================================== > hive> select * from test; > OK > user1 cat1 1 > user1 cat1 2 > user1 cat1 3 > user1 cat2 10 > user1 cat2 20 > user1 cat2 30 > user2 cat1 11 > user2 cat1 21 > user2 cat1 31 > user2 cat2 5 > user2 cat2 6 > user2 cat2 7 > Time taken: 0.202 seconds > ================================================== > hive> SELECT user, category, value > > FROM ( > > SELECT user, category, rank(user) as rank, value > > FROM test > > DISTRIBUTE BY user > > SORT BY user, value desc > > ) a > > WHERE rank > 1 > > ORDER BY user, rank; > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column > reference 'rank': (possible column names are: _col0, _col1, _col2) > hive> > > ________________________________________________________________________________ > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "What I primarily learned in grad school is how much I *don't* know. > Consequently, I left grad school with a higher ignorance to knowledge ratio > than > when I entered." > -- Keith Wiley > ________________________________________________________________________________ > > ________________________________________________________________________________ Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com "Luminous beings are we, not this crude matter." -- Yoda ________________________________________________________________________________