yeah. good one. its a bug alright. where the trouble starts is when a
'?' follows a wildcard char. '%' in this case. Looks like hive slavishly
mutates a '%' to a '.*' and since you got a '?' after that you get another
valid regex looking like '.*?' ('?" being the minimally match directive)
so it eats your '?' as meta-data and then things just go weird from there.
in your case #2:
'%?' (aka '.*?') matches '?&' in your string
'icid=main' matches 'icid=main'
'%dl' matches 'moiz%dl'
'%' matches nothing but thats ok
so you get a true.
*
*
i don't know of a workaround using the like operator. i've tried various
number of blackslashes in front of the question mark to no avail to make it
a literal question mark and not a meta question mark. maybe someone else on
this list more knowledgeable on the internals can prescribe a solution.
in the meantime use rlike and a real regexp. use double backslashes to
escape meta characters.
consider this regexp in hive: 'your_string' rlike '.*\\?icid=main.*dl.*'
as equivalent to: 'your_string' like '%?icid=main%dl%'
let us know how it turns out.
Thanks again for finding and reporting this.
thanks,
Stephen.
PS Note that these are only _my_ findings and as such i can certainly be
wrong so please feel free to verify or disprove. I'm using hive 0.80 which
is kinda old.
On Thu, Jun 13, 2013 at 12:30 PM, Sunderlin, Mark <
[email protected]> wrote:
> Hive users: It took me a while to create a standalone use case, but here
> we go:
>
> In Oracle, this SQL returns false for both cases:
>
> SELECT CASE
> WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T'
> ELSE 'F'
> END
> AS case1,
> CASE
> WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T'
> ELSE 'F'
> END
> AS case2
> FROM DUAL
> case1 case2
> F F
>
> But in Hive 0.80, I get one False and one True:
>
> SELECT CASE
> WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T'
> ELSE 'F'
> END
> AS case1,
> CASE
> WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T'
> ELSE 'F'
> END
> AS case2
> FROM DUAL - A dummy single row, single column table we installed
> case1 case2
> F T
>
>
>
> ---
> Mark E. Sunderlin
> Solutions Architect |AOL NETWORKS BDM
> P: 703-265-6935 |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way | Dulles, VA | 20166
>
> From: Stephen Sprague [mailto:[email protected]]
> Sent: Wednesday, June 12, 2013 2:22 PM
> To: [email protected]
> Subject: Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
>
> Hi Mark,
> i'm running v0.80 too and multiple '%'s work as expected for me. so. we're
> gonna need a see a definitive test case from you.
> show your full string and show where the like clause fails to match.
> thanks,
> Stephen.
> PS here's my test:
>
> hisql>select city from junk;
> +------------------+
> | city |
> +------------------+
> | West Haven |
> | West Haven |
> | West Haven |
> | West Haven |
> | West Jordan |
> | West Chester |
> | Westminster |
> | West Chester |
> | West Chester |
> | Westlake |
> | West Chester |
> | West Chester |
> | West Chester |
> | Weston |
> | West Covina |
> | West Chester |
> | West Chester |
> | West Hollywood |
> | West Des Moines |
> | West Covina |
> | Westland |
> | West Valley City |
> | West Valley City |
> | West Valley City |
> | West Valley City |
> | Westminster |
> | West Memphis |
> | West Memphis |
> +------------------+
> 28 affected
>
> hisql>select city from junk where city like '%West%v%';
> +-------------+
> | city |
> +-------------+
> | West Haven |
> | West Haven |
> | West Haven |
> | West Haven |
> | West Covina |
> | West Covina |
> +-------------+
> 6 affected
>
> hisql>select city from junk where city like '%West%v%a%';
> +-------------+
> | city |
> +-------------+
> | West Covina |
> | West Covina |
> +-------------+
> 2 affected
>
>
> On Wed, Jun 12, 2013 at 10:26 AM, Sunderlin, Mark <
> [email protected]> wrote:
> We are using Hive 0.80.
>
> ---
> Mark E. Sunderlin
> Solutions Architect |AOL NETWORKS BDM
> P: 703-265-6935 |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way | Dulles, VA | 20166
>
> From: Stephen Sprague [mailto:[email protected]]
> Sent: Wednesday, June 12, 2013 1:00 PM
> To: [email protected]
> Subject: Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
>
> that seems pretty hard to believe. what version of hive are you using?
>
> On Wed, Jun 12, 2013 at 6:27 AM, Sunderlin, Mark <
> [email protected]> wrote:
> This seems to work just fine in other SQLs, but doesn't seem work in hive.
>
> I need to have several wild card characters in my 'like' clause as follows.
>
> In other SQLs, I want: .... where page_url_query like '%?icid=main%dl%'
>
> But in Hive that doesn't match. I have several work arounds. I can write
> it as:
>
> where (page_url_query like '%?icid=main%') and (page_url_query like
> '%dl%')
>
> or, I can just use something like where
> length(regexp_extract(page__url_query, '.*\?cdid-main.*dl.*',0) <> 0
>
> But more important is two or more '%' wild cards in a Hive like expression
> a known issue?
>
> ---
> Mark E. Sunderlin
> Solutions Architect |AOL NETWORKS BDM
> P: 703-265-6935 |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way, Dulles, VA 20166
>
>
>