yeah. good one. its a bug alright. where the trouble starts is when a '?' follows a wildcard char. '%' in this case. Looks like hive slavishly mutates a '%' to a '.*' and since you got a '?' after that you get another valid regex looking like '.*?' ('?" being the minimally match directive) so it eats your '?' as meta-data and then things just go weird from there.
in your case #2: '%?' (aka '.*?') matches '?&' in your string 'icid=main' matches 'icid=main' '%dl' matches 'moiz%dl' '%' matches nothing but thats ok so you get a true. * * i don't know of a workaround using the like operator. i've tried various number of blackslashes in front of the question mark to no avail to make it a literal question mark and not a meta question mark. maybe someone else on this list more knowledgeable on the internals can prescribe a solution. in the meantime use rlike and a real regexp. use double backslashes to escape meta characters. consider this regexp in hive: 'your_string' rlike '.*\\?icid=main.*dl.*' as equivalent to: 'your_string' like '%?icid=main%dl%' let us know how it turns out. Thanks again for finding and reporting this. thanks, Stephen. PS Note that these are only _my_ findings and as such i can certainly be wrong so please feel free to verify or disprove. I'm using hive 0.80 which is kinda old. On Thu, Jun 13, 2013 at 12:30 PM, Sunderlin, Mark < mark.sunder...@teamaol.com> wrote: > Hive users: It took me a while to create a standalone use case, but here > we go: > > In Oracle, this SQL returns false for both cases: > > SELECT CASE > WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T' > ELSE 'F' > END > AS case1, > CASE > WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T' > ELSE 'F' > END > AS case2 > FROM DUAL > case1 case2 > F F > > But in Hive 0.80, I get one False and one True: > > SELECT CASE > WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T' > ELSE 'F' > END > AS case1, > CASE > WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T' > ELSE 'F' > END > AS case2 > FROM DUAL - A dummy single row, single column table we installed > case1 case2 > F T > > > > --- > Mark E. Sunderlin > Solutions Architect |AOL NETWORKS BDM > P: 703-265-6935 |C: 540-327-6222 | AIM: MESunderlin > 22000 AOL Way | Dulles, VA | 20166 > > From: Stephen Sprague [mailto:sprag...@gmail.com] > Sent: Wednesday, June 12, 2013 2:22 PM > To: user@hive.apache.org > Subject: Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern > > Hi Mark, > i'm running v0.80 too and multiple '%'s work as expected for me. so. we're > gonna need a see a definitive test case from you. > show your full string and show where the like clause fails to match. > thanks, > Stephen. > PS here's my test: > > hisql>select city from junk; > +------------------+ > | city | > +------------------+ > | West Haven | > | West Haven | > | West Haven | > | West Haven | > | West Jordan | > | West Chester | > | Westminster | > | West Chester | > | West Chester | > | Westlake | > | West Chester | > | West Chester | > | West Chester | > | Weston | > | West Covina | > | West Chester | > | West Chester | > | West Hollywood | > | West Des Moines | > | West Covina | > | Westland | > | West Valley City | > | West Valley City | > | West Valley City | > | West Valley City | > | Westminster | > | West Memphis | > | West Memphis | > +------------------+ > 28 affected > > hisql>select city from junk where city like '%West%v%'; > +-------------+ > | city | > +-------------+ > | West Haven | > | West Haven | > | West Haven | > | West Haven | > | West Covina | > | West Covina | > +-------------+ > 6 affected > > hisql>select city from junk where city like '%West%v%a%'; > +-------------+ > | city | > +-------------+ > | West Covina | > | West Covina | > +-------------+ > 2 affected > > > On Wed, Jun 12, 2013 at 10:26 AM, Sunderlin, Mark < > mark.sunder...@teamaol.com> wrote: > We are using Hive 0.80. > > --- > Mark E. Sunderlin > Solutions Architect |AOL NETWORKS BDM > P: 703-265-6935 |C: 540-327-6222 | AIM: MESunderlin > 22000 AOL Way | Dulles, VA | 20166 > > From: Stephen Sprague [mailto:sprag...@gmail.com] > Sent: Wednesday, June 12, 2013 1:00 PM > To: user@hive.apache.org > Subject: Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern > > that seems pretty hard to believe. what version of hive are you using? > > On Wed, Jun 12, 2013 at 6:27 AM, Sunderlin, Mark < > mark.sunder...@teamaol.com> wrote: > This seems to work just fine in other SQLs, but doesn't seem work in hive. > > I need to have several wild card characters in my 'like' clause as follows. > > In other SQLs, I want: .... where page_url_query like '%?icid=main%dl%' > > But in Hive that doesn't match. I have several work arounds. I can write > it as: > > where (page_url_query like '%?icid=main%') and (page_url_query like > '%dl%') > > or, I can just use something like where > length(regexp_extract(page__url_query, '.*\?cdid-main.*dl.*',0) <> 0 > > But more important is two or more '%' wild cards in a Hive like expression > a known issue? > > --- > Mark E. Sunderlin > Solutions Architect |AOL NETWORKS BDM > P: 703-265-6935 |C: 540-327-6222 | AIM: MESunderlin > 22000 AOL Way, Dulles, VA 20166 > > >