> LIKE '%%%escape' --> EQUALS TO '%%escape' ??? In the current implementation (‘%’ could be a wildcard only at the start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘. Moreover all terms that contains single ‘%’ somewhere in the middle should cause an exception. BUT may be it’s better to make escaping more universal to support a future possible case where a wildcard could be placed in the middle of a term too?
Thanks, Mikhail > On 24 Sep 2016, at 21:09, DuyHai Doan <doanduy...@gmail.com> wrote: > > Reminder, right now, the % character is only interpreted as wildcard IF AND > ONLY IF it is the first/last character of the searched term > > > LIKE '%escape' --> ENDS WITH 'escape' > > If we use % to escape %, > LIKE '%%escape' --> EQUALS TO '%escape' > > LIKE '%%%escape' --> EQUALS TO '%%escape' ??? > > > > > On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy > <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> > wrote: > Hi, Jim, > > What pattern should be used to search “ends with ‘%escape’ “ with your > conception? > > Thanks, > Mikhail > >> On 22 Sep 2016, at 18:51, Jim Ancona <j...@anconafamily.com >> <mailto:j...@anconafamily.com>> wrote: >> >> To answer DuyHai's question without introducing new syntax, I'd suggest: >>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' >> So the first two %'s are translated to a literal, non-wildcard % and the >> third % is a wildcard because it's not doubled. >> >> Jim >> >> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy >> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> >> wrote: >> I guess that it should be similar to how it is done in SQL for LIKE patterns. >> >> You can introduce an escape character, e.g. ‘\’. >> Examples: >> ‘%’ - any string >> ‘\%’ - equal to ‘%’ character >> ‘\%foo%’ - starts from ‘%foo’ >> ‘%%%escape’ - ends with ’escape’ >> ‘\%%’ - starts from ‘%’ >> ‘\\\%%’ - starts from ‘\%’ . >> >> What do you think? >> >> Thanks, >> Mikhail >>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com >>> <mailto:doanduy...@gmail.com>> wrote: >>> >>> Hello Mikhail >>> >>> It's more complicated that it seems >>> >>> LIKE '%%escape' means EQUAL TO '%escape' >>> >>> LIKE '%escape' means ENDS WITH 'escape' >>> >>> What's about LIKE '%%%escape' ???? >>> >>> How should we treat this case ? Replace %% by % at the beginning of the >>> searched term ?? >>> >>> >>> >>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy >>> <mikhail.krupits...@jetbrains.com >>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>> Hi! >>> >>> We’ve talked about two items: >>> 1) ‘%’ as a wildcard in the middle of LIKE pattern. >>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help >>> of LIKE. >>> >>> Item #1was resolved as CASSANDRA-12573. >>> >>> Regarding to item #2: you said the following: >>>> A possible fix would be: >>>> >>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on >>>> the column data type) >>>> 2) remove the escape character e.g. before parsing OR use some advanced >>>> regex to exclude the %% from parsing e.g >>>> >>>> Step 2) is dead easy but step 1) is harder because I don't know if >>>> converting the bytebuffer into String at this stage of the CQL parser is >>>> expensive or not (in term of computation) >>>> >>>> Let me try a patch >>> >>> So is there any update on this? >>> >>> Thanks, >>> Mikhail >>> >>> >>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy >>>> <mikhail.krupits...@jetbrains.com >>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>>> >>>> Hi! >>>> >>>> Have you had a chance to try your patch or solve the issue in an other >>>> way? >>>> >>>> Thanks, >>>> Mikhail >>>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com >>>>> <mailto:doanduy...@gmail.com>> wrote: >>>>> >>>>> Ok so I've found the source of the issue, it's pretty well hidden because >>>>> it is NOT in the SASI source code directly. >>>>> >>>>> Here is the method where C* determines what kind of LIKE expression >>>>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES) >>>>> >>>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 >>>>> >>>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778> >>>>> >>>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget >>>>> to remove escape character BEFORE doing the matching so if your search is >>>>> LIKE '%%esc%', the detected expression is LIKE_CONTAINS. >>>>> >>>>> A possible fix would be: >>>>> >>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on >>>>> the column data type) >>>>> 2) remove the escape character e.g. before parsing OR use some advanced >>>>> regex to exclude the %% from parsing e.g >>>>> >>>>> Step 2) is dead easy but step 1) is harder because I don't know if >>>>> converting the bytebuffer into String at this stage of the CQL parser is >>>>> expensive or not (in term of computation) >>>>> >>>>> Let me try a patch >>>>> >>>>> >>>>> >>>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com >>>>> <mailto:doanduy...@gmail.com>> wrote: >>>>> Ok you're right, I get your point >>>>> >>>>> LIKE '%%esc%' --> startWith('%esc') >>>>> >>>>> LIKE 'escape%%' --> = 'escape%' >>>>> >>>>> What I strongly suspect is that in the source code of SASI, we parse the >>>>> % xxx % expression BEFORE applying escape. That will explain the observed >>>>> behavior. E.g: >>>>> >>>>> LIKE '%%esc%' parsed as %xxx% where xxx = %esc >>>>> >>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape% >>>>> >>>>> Let me check in the source code and try to reproduce the issue >>>>> >>>>> >>>>> >>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy >>>>> <mikhail.krupits...@jetbrains.com >>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>>>> Looks like we have different understanding of what results are expected. >>>>> I based my understanding on >>>>> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html >>>>> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html> >>>>> According to the doc ‘esc’ is a pattern for exact match and I guess that >>>>> there is no semantical difference between two LIKE patterns (both of >>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’. >>>>> >>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>>>>> containing '%esc' so %escapeme is a possible match and also escape%esc >>>>> Why ‘containing’? I expect that it should be ’starting’.. >>>>>> >>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>>>>> starting with 'escape%' so escape%me is a valid result and also >>>>>> escape%esc >>>>> Why ’starting’? I expect that it should be ‘exact matching’. >>>>> >>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it >>>>> returns nothing (CASSANDRA-12573). >>>>> >>>>> What I’m missing? >>>>> >>>>> Thanks, >>>>> Mikhail >>>>> >>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com >>>>>> <mailto:doanduy...@gmail.com>> wrote: >>>>>> >>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING >>>>>> 'org.apache.cassandra.index.sa >>>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = >>>>>> {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa >>>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', >>>>>> 'case_sensitive': 'false'}; >>>>>> >>>>>> I don't see any problem in the results you got >>>>>> >>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>>>>> containing '%esc' so %escapeme is a possible match and also escape%esc >>>>> Why ‘containing’? I expect that it should be ’starting’.. >>>>>> >>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>>>>> starting with 'escape%' so escape%me is a valid result and also >>>>>> escape%esc >>>>> Why ’starting’? I expect that it should be ‘exact matching’. >>>>> >>>>>> >>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy >>>>>> <mikhail.krupits...@jetbrains.com >>>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>>>>> Thanks for the reply. >>>>>> Could you please provide what index definition did you use? >>>>>> With the index from my script I get the following results: >>>>>> >>>>>> cqlsh:test> select * from escape; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 1 | %escapeme >>>>>> 2 | escape%me >>>>>> 3 | escape%esc >>>>>> >>>>>> Contains search >>>>>> >>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 1 | %escapeme >>>>>> 3 | escape%esc >>>>>> (2 rows) >>>>>> >>>>>> >>>>>> Prefix search >>>>>> >>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 2 | escape%me >>>>>> 3 | escape%esc >>>>>> >>>>>> Thanks, >>>>>> Mikhail >>>>>> >>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com >>>>>>> <mailto:doanduy...@gmail.com>> wrote: >>>>>>> >>>>>>> Use % to escape % >>>>>>> >>>>>>> cqlsh:test> select * from escape; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 1 | %escapeme >>>>>>> 2 | escape%me >>>>>>> >>>>>>> >>>>>>> Contains search >>>>>>> >>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 1 | %escapeme >>>>>>> >>>>>>> (1 rows) >>>>>>> >>>>>>> >>>>>>> Prefix search >>>>>>> >>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 2 | escape%me >>>>>>> >>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy >>>>>>> <mikhail.krupits...@jetbrains.com >>>>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>>>>>> Hi Cassandra guys, >>>>>>> >>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a >>>>>>> search pattern. >>>>>>> Here is my test script: >>>>>>> >>>>>>> DROP keyspace if exists kmv; >>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : >>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ; >>>>>>> USE kmv; >>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY >>>>>>> KEY(id, c1)); >>>>>>> CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING >>>>>>> 'org.apache.cassandra.index.sa >>>>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = { >>>>>>> 'analyzed' : 'true', >>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa >>>>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', >>>>>>> 'case_sensitive' : 'false', >>>>>>> 'mode' : 'CONTAINS' >>>>>>> }; >>>>>>> >>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd'); >>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd'); >>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%'); >>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1'); >>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd'); >>>>>>> >>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_'; >>>>>>> >>>>>>> _pattern_ '%%%' finds all columns that contain %. >>>>>>> How to find columns that start form ‘%’ or ‘%a’? >>>>>>> How to find columns that end with ‘%’? >>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate >>>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue >>>>>>> CASSANDRA-12573). >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Mikhail >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> >> > >