Thanks, I've added a note about case-insensitivity to the UDF doc and the Tutorial:
- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF - https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Builtinoperatorsandfunctions -- Lefty On Tue, Sep 3, 2013 at 7:38 AM, Edward Capriolo <[email protected]>wrote: > Beyond that answer please note: > Udf names are always case insensitive. RaNk = RANK = rank > For rank this is a special case that was fixed. > > > On Tue, Sep 3, 2013 at 9:15 AM, j.barrett Strausser < > [email protected]> wrote: > >> I filed HIVE-5038. My issue was that it seemed incorrect that rank took >> an arg. I haven't verified that the issue is fixed in trunk as claimed in >> the JIRA.However, between the reponse in JIRA and in this thread it seems >> resolved. >> >> >> On Tue, Sep 3, 2013 at 4:23 AM, Lefty Leverenz >> <[email protected]>wrote: >> >>> Another email thread led me to >>> HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank operator >>> is case-sensitive and has odd semantics") -- it's resolved >>> as invalid, but is that only for the odd semantics? >>> >>> Perhaps this issue is clarified in more recent emails. I'm catching up >>> on a huge backlog. >>> >>> -- Lefty >>> >>> >>> On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz >>> <[email protected]>wrote: >>> >>>> What's the answer -- does the "rank" keyword have to be lowercase? >>>> >>>> If lowercase is obligatory we need to revise the wiki, which shows all >>>> uppercase ( >>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics >>>> ). >>>> >>>> In the test files it's lowercase (windowing_rank.q, >>>> ptf_negative_WhereWithRankCond.q). >>>> The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but >>>> I don't know if that means lowercase is required: >>>> >>>> @WindowFunctionDescription >>>>> >>>>> ( >>>>> >>>>> description = @Description( >>>>> >>>>> name = "rank", >>>>> >>>>> value = "_FUNC_(x)" >>>>> >>>>> ), >>>>> >>>>> supportsWindow = false, >>>>> >>>>> pivotResult = true >>>>> >>>>> ) >>>>> >>>> >>>> >>>> And what about the other keywords in the wikidoc? Same lowercase >>>> requirement? >>>> >>>> -- Lefty >>>> >>>> >>>> On Fri, Jul 26, 2013 at 5:30 PM, saurabh <[email protected]>wrote: >>>> >>>>> Hi all, >>>>> >>>>> Below are some of observations based on the on-going rank function >>>>> discussion. >>>>> >>>>> 1. I executed below mentioned queries and only the query with "rank" >>>>> (lowercase) executed successfully, rest were throwing exceptions "FAILED: >>>>> SemanticException Failed to breakup Windowing invocations into Groups." >>>>> >>>>> - select cust_id, ord_dt, RANK() w from cust_ord window w as >>>>> (partition by cust_id order by ord_dt); >>>>> >>>>> - select cust_id, ord_dt, Rank() w from cust_ord window w as >>>>> (partition by cust_id order by ord_dt); >>>>> >>>>> - select cust_id, ord_dt, rank() w from cust_ord window w as >>>>> (partition by cust_id order by ord_dt); >>>>> >>>>> It seems "rank" keyword is case-sensitive. Attached is the screenshot >>>>> for reference. >>>>> >>>>> 2. I created a dummy table with the data provided in the below mail >>>>> trail and achieved the expected output, using the below mentioned query. >>>>> >>>>> *select cust_id, ord_dt, rank() over (partition by cust_id order by >>>>> ord_dt) from cust_ord;* >>>>> >>>>> Request all to kindly review these details and suggest if it was of >>>>> any help! >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser < >>>>> [email protected]> wrote: >>>>> >>>>>> Any further help on this, otherwise I'll file a jira. >>>>>> >>>>>> >>>>>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> As an example : If I run my query above removing the arg the >>>>>>> following is thrown. >>>>>>> >>>>>>> FAILED: SemanticException Failed to breakup Windowing invocations >>>>>>> into Groups. At least 1 group must only depend on input columns. Also >>>>>>> check >>>>>>> for circular dependencies. >>>>>>> Underlying error: >>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more >>>>>>> arguments are expected. >>>>>>> >>>>>>> >>>>>>> Similar issue and fix here: >>>>>>> >>>>>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html >>>>>>> >>>>>>> Even if it didn't require an arg it still doesn't explain my >>>>>>> anomalous output. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> That isn't true. If you try to run the above HIVE without an >>>>>>>> argument, it will throw an exception. I have seen other users replicate >>>>>>>> this problem as well. >>>>>>>> >>>>>>>> I can file a JIRA if someone can confirm that my query should work. >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 24, 2013 at 11:02 PM, [email protected] < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Analytical function doesn't expect any argument. Rank() itself >>>>>>>>> enough to sequence based on the window you have defined in partition >>>>>>>>> by. So >>>>>>>>> >>>>>>>>> Rank() over (partition by cmscustid order by orderdate) >>>>>>>>> >>>>>>>>> Should work as long as I have wrote right syntax for hive. >>>>>>>>> >>>>>>>>> Sent via Rocket from my HTC >>>>>>>>> >>>>>>>>> ----- Reply message ----- >>>>>>>>> From: "j.barrett Strausser" <[email protected]> >>>>>>>>> To: <[email protected]> >>>>>>>>> Subject: Semantics of Rank. >>>>>>>>> Date: Thu, Jul 25, 2013 1:08 AM >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation >>>>>>>>> between >>>>>>>>> rank and the windowing function is wrong. >>>>>>>>> >>>>>>>>> What I want to achieve for the following is : For a given customer >>>>>>>>> id, >>>>>>>>> sort his orders. I thought the below would work. >>>>>>>>> >>>>>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM >>>>>>>>> order_data eh >>>>>>>>> window w as (partition by cmscustid order by orderdate); >>>>>>>>> >>>>>>>>> The rank function instead returns the rank of the order date over >>>>>>>>> all all >>>>>>>>> order dates. >>>>>>>>> >>>>>>>>> Example snippet from above >>>>>>>>> >>>>>>>>> Actual : >>>>>>>>> >>>>>>>>> 6758783 27APR2012 94 >>>>>>>>> 6758783 23JUN2012 95 >>>>>>>>> 6758785 14DEC2012 96 >>>>>>>>> 6758795 18DEC2011 97 >>>>>>>>> 6758796 06MAY2012 98 >>>>>>>>> 6758798 24MAR2013 99 >>>>>>>>> 6758799 23NOV2012 100 >>>>>>>>> >>>>>>>>> >>>>>>>>> Expected : >>>>>>>>> >>>>>>>>> 6758783 27APR2012 1 >>>>>>>>> 6758783 23JUN2012 2 >>>>>>>>> 6758785 14DEC2012 1 >>>>>>>>> 6758795 18DEC2011 1 >>>>>>>>> 6758796 06MAY2012 1 >>>>>>>>> 6758798 24MAR2013 1 >>>>>>>>> 6758799 23NOV2012 1 >>>>>>>>> >>>>>>>>> >>>>>>>>> -b >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman < >>>>>>>>> [email protected]>wrote: >>>>>>>>> >>>>>>>>> > the argument to rank is simply some value, whereas the rank >>>>>>>>> function >>>>>>>>> > compare this value >>>>>>>>> > to the previous value received, if value is same, rank returns >>>>>>>>> ++index, >>>>>>>>> > otherwise, rank return 1. >>>>>>>>> > pseudo code: >>>>>>>>> > >>>>>>>>> > class Rank { >>>>>>>>> > int index; >>>>>>>>> > Object previousValue = null; >>>>>>>>> > int evaluate(Object value) { >>>>>>>>> > if (value == previousValue) { >>>>>>>>> > return ++index; >>>>>>>>> > } >>>>>>>>> > previousValue = value; >>>>>>>>> > index = 1; >>>>>>>>> > return 1; >>>>>>>>> > } >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser < >>>>>>>>> > [email protected]> wrote: >>>>>>>>> > >>>>>>>>> >> It seems as though I am required to pass in an argument to >>>>>>>>> RANK(). >>>>>>>>> >> >>>>>>>>> >> What is the effect of passing this argument in ? >>>>>>>>> >> >>>>>>>>> >> In a RANK function the output for RANK should be the number of >>>>>>>>> rows >>>>>>>>> >> preceding >>>>>>>>> >> a row in a given window. As windows are specified by the >>>>>>>>> partition and >>>>>>>>> >> order by fields I don't understand the effect of passing in an >>>>>>>>> arg to RANK >>>>>>>>> >> >>>>>>>>> >> Are the other non-aggregate function similar? Meaning : >>>>>>>>> Cume_dist, Ntile >>>>>>>>> >> -- >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> https://github.com/bearrito >>>>>>>>> >> @deepbearrito >>>>>>>>> >> >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > The above terms reflect a potential business arrangement, are >>>>>>>>> provided solely >>>>>>>>> > as a basis for further discussion, and are not intended to be >>>>>>>>> and do not >>>>>>>>> > constitute a legally binding obligation. No legally binding >>>>>>>>> obligations will >>>>>>>>> > be created, implied, or inferred until an agreement in final >>>>>>>>> form is executed >>>>>>>>> > in writing by all parties involved. >>>>>>>>> > >>>>>>>>> > This email and any attachments hereto may be confidential or >>>>>>>>> privileged. >>>>>>>>> > If you received this communication by mistake, please don't >>>>>>>>> forward it >>>>>>>>> > to anyone else, please erase all copies and attachments, and >>>>>>>>> please let >>>>>>>>> > me know that it has gone to the wrong person. Thanks. >>>>>>>>> > >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/bearrito >>>>>>>>> @deepbearrito >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> https://github.com/bearrito >>>>>>>> @deepbearrito >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> https://github.com/bearrito >>>>>>> @deepbearrito >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> https://github.com/bearrito >>>>>> @deepbearrito >>>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> >> >> https://github.com/bearrito >> @deepbearrito >> > >
