Reminder, right now, the % character is only interpreted as wildcard IF AND ONLY IF it is the first/last character of the searched term
LIKE '%escape' --> ENDS WITH 'escape' If we use % to escape %, LIKE '%%escape' --> EQUALS TO '%escape' LIKE '%%%escape' --> EQUALS TO '%%escape' ??? On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy < mikhail.krupits...@jetbrains.com> wrote: > Hi, Jim, > > What pattern should be used to search “ends with ‘%escape’ “ with your > conception? > > Thanks, > Mikhail > > On 22 Sep 2016, at 18:51, Jim Ancona <j...@anconafamily.com> wrote: > > To answer DuyHai's question without introducing new syntax, I'd suggest: > > LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' > > So the first two %'s are translated to a literal, non-wildcard % and the > third % is a wildcard because it's not doubled. > > Jim > > On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy < > mikhail.krupits...@jetbrains.com> wrote: > >> I guess that it should be similar to how it is done in SQL for LIKE >> patterns. >> >> You can introduce an escape character, e.g. ‘\’. >> Examples: >> ‘%’ - any string >> ‘\%’ - equal to ‘%’ character >> ‘\%foo%’ - starts from ‘%foo’ >> ‘%%%escape’ - ends with ’escape’ >> ‘\%%’ - starts from ‘%’ >> ‘\\\%%’ - starts from ‘\%’ . >> >> What do you think? >> >> Thanks, >> Mikhail >> >> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com> wrote: >> >> Hello Mikhail >> >> It's more complicated that it seems >> >> LIKE '%%escape' means EQUAL TO '%escape' >> >> LIKE '%escape' means ENDS WITH 'escape' >> >> What's about LIKE '%%%escape' ???? >> >> How should we treat this case ? Replace %% by % at the beginning of the >> searched term ?? >> >> >> >> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy < >> mikhail.krupits...@jetbrains.com> wrote: >> >>> Hi! >>> >>> We’ve talked about two items: >>> 1) ‘%’ as a wildcard in the middle of LIKE pattern. >>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with >>> help of LIKE. >>> >>> Item #1was resolved as CASSANDRA-12573. >>> >>> Regarding to item #2: you said the following: >>> >>> A possible fix would be: >>> >>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on >>> the column data type) >>> 2) remove the escape character e.g. before parsing OR use some advanced >>> regex to exclude the %% from parsing e.g >>> >>> Step 2) is dead easy but step 1) is harder because I don't know if >>> converting the bytebuffer into String at this stage of the CQL parser is >>> expensive or not (in term of computation) >>> >>> Let me try a patch >>> >>> So is there any update on this? >>> >>> Thanks, >>> Mikhail >>> >>> >>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy < >>> mikhail.krupits...@jetbrains.com> wrote: >>> >>> Hi! >>> >>> Have you had a chance to try your patch or solve the issue in an other >>> way? >>> >>> Thanks, >>> Mikhail >>> >>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com> wrote: >>> >>> Ok so I've found the source of the issue, it's pretty well hidden >>> because it is NOT in the SASI source code directly. >>> >>> Here is the method where C* determines what kind of LIKE expression >>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES) >>> >>> https://github.com/apache/cassandra/blob/trunk/src/java/org/ >>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.j >>> ava#L733-L778 >>> >>> As you can see, it's pretty simple, maybe too simple. Indeed, they >>> forget to remove escape character BEFORE doing the matching so if your >>> search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS. >>> >>> A possible fix would be: >>> >>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on >>> the column data type) >>> 2) remove the escape character e.g. before parsing OR use some advanced >>> regex to exclude the %% from parsing e.g >>> >>> Step 2) is dead easy but step 1) is harder because I don't know if >>> converting the bytebuffer into String at this stage of the CQL parser is >>> expensive or not (in term of computation) >>> >>> Let me try a patch >>> >>> >>> >>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com> >>> wrote: >>> >>>> Ok you're right, I get your point >>>> >>>> LIKE '%%esc%' --> startWith('%esc') >>>> >>>> LIKE 'escape%%' --> = 'escape%' >>>> >>>> What I strongly suspect is that in the source code of SASI, we parse >>>> the % xxx % expression BEFORE applying escape. That will explain the >>>> observed behavior. E.g: >>>> >>>> LIKE '%%esc%' parsed as %xxx% where xxx = %esc >>>> >>>> LIKE 'escape%%' parsed as xxx% where xxx =escape% >>>> >>>> Let me check in the source code and try to reproduce the issue >>>> >>>> >>>> >>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy < >>>> mikhail.krupits...@jetbrains.com> wrote: >>>> >>>>> Looks like we have different understanding of what results are >>>>> expected. >>>>> I based my understanding on http://docs.datastax.com/en >>>>> /cql/3.3/cql/cql_using/useSASIIndex.html >>>>> According to the doc ‘esc’ is a pattern for exact match and I guess >>>>> that there is no semantical difference between two LIKE patterns (both of >>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’. >>>>> >>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape >>>>> *%esc* >>>>> >>>>> Why ‘containing’? I expect that it should be ’starting’.. >>>>> >>>>> >>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>>>> *starting* with 'escape%' so *escape%*me is a valid result and also >>>>> *escape%*esc >>>>> >>>>> Why ’starting’? I expect that it should be ‘exact matching’. >>>>> >>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it >>>>> returns nothing (CASSANDRA-12573). >>>>> >>>>> What I’m missing? >>>>> >>>>> Thanks, >>>>> Mikhail >>>>> >>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com> wrote: >>>>> >>>>> CREATE CUSTOM INDEX ON test.escape(val) USING ' >>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': >>>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa >>>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'}; >>>>> >>>>> I don't see any problem in the results you got >>>>> >>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape >>>>> *%esc* >>>>> >>>>> Why ‘containing’? I expect that it should be ’starting’.. >>>>> >>>>> >>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>>>> *starting* with 'escape%' so *escape%*me is a valid result and also >>>>> *escape%*esc >>>>> >>>>> Why ’starting’? I expect that it should be ‘exact matching’. >>>>> >>>>> >>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy < >>>>> mikhail.krupits...@jetbrains.com> wrote: >>>>> >>>>>> Thanks for the reply. >>>>>> Could you please provide what index definition did you use? >>>>>> With the index from my script I get the following results: >>>>>> >>>>>> cqlsh:test> select * from escape; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 1 | %escapeme >>>>>> 2 | escape%me >>>>>> * 3 | escape%esc* >>>>>> >>>>>> Contains search >>>>>> >>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 1 | %escapeme >>>>>> 3 >>>>>> * | escape%esc*(2 rows) >>>>>> >>>>>> >>>>>> Prefix search >>>>>> >>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 2 | escape%me >>>>>> 3 >>>>>> * | escape%esc* >>>>>> >>>>>> Thanks, >>>>>> Mikhail >>>>>> >>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com> wrote: >>>>>> >>>>>> Use % to escape % >>>>>> >>>>>> cqlsh:test> select * from escape; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 1 | %escapeme >>>>>> 2 | escape%me >>>>>> >>>>>> >>>>>> Contains search >>>>>> >>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 1 | %escapeme >>>>>> >>>>>> (1 rows) >>>>>> >>>>>> >>>>>> Prefix search >>>>>> >>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 2 | escape%me >>>>>> >>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy < >>>>>> mikhail.krupits...@jetbrains.com> wrote: >>>>>> >>>>>>> Hi Cassandra guys, >>>>>>> >>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in >>>>>>> a search pattern. >>>>>>> Here is my test script: >>>>>>> >>>>>>> DROP keyspace if exists kmv; >>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : >>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ; >>>>>>> USE kmv; >>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY >>>>>>> KEY(id, c1)); >>>>>>> CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING ' >>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { >>>>>>> 'analyzed' : 'true', >>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa >>>>>>> si.analyzer.NonTokenizingAnalyzer', >>>>>>> 'case_sensitive' : 'false', >>>>>>> 'mode' : 'CONTAINS' >>>>>>> }; >>>>>>> >>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd'); >>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd'); >>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%'); >>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1'); >>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd'); >>>>>>> >>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_'; >>>>>>> >>>>>>> _pattern_ '%%%' finds all columns that contain %. >>>>>>> How to find columns that start form ‘%’ or ‘%a’? >>>>>>> How to find columns that end with ‘%’? >>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate >>>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue >>>>>>> CASSANDRA-12573). >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Mikhail >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >>> >> >> > >