Hi, Jim, What pattern should be used to search “ends with ‘%escape’ “ with your conception?
Thanks, Mikhail > On 22 Sep 2016, at 18:51, Jim Ancona <j...@anconafamily.com> wrote: > > To answer DuyHai's question without introducing new syntax, I'd suggest: >> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' > So the first two %'s are translated to a literal, non-wildcard % and the > third % is a wildcard because it's not doubled. > > Jim > > On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy > <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> > wrote: > I guess that it should be similar to how it is done in SQL for LIKE patterns. > > You can introduce an escape character, e.g. ‘\’. > Examples: > ‘%’ - any string > ‘\%’ - equal to ‘%’ character > ‘\%foo%’ - starts from ‘%foo’ > ‘%%%escape’ - ends with ’escape’ > ‘\%%’ - starts from ‘%’ > ‘\\\%%’ - starts from ‘\%’ . > > What do you think? > > Thanks, > Mikhail >> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com >> <mailto:doanduy...@gmail.com>> wrote: >> >> Hello Mikhail >> >> It's more complicated that it seems >> >> LIKE '%%escape' means EQUAL TO '%escape' >> >> LIKE '%escape' means ENDS WITH 'escape' >> >> What's about LIKE '%%%escape' ???? >> >> How should we treat this case ? Replace %% by % at the beginning of the >> searched term ?? >> >> >> >> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy >> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> >> wrote: >> Hi! >> >> We’ve talked about two items: >> 1) ‘%’ as a wildcard in the middle of LIKE pattern. >> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help >> of LIKE. >> >> Item #1was resolved as CASSANDRA-12573. >> >> Regarding to item #2: you said the following: >>> A possible fix would be: >>> >>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on >>> the column data type) >>> 2) remove the escape character e.g. before parsing OR use some advanced >>> regex to exclude the %% from parsing e.g >>> >>> Step 2) is dead easy but step 1) is harder because I don't know if >>> converting the bytebuffer into String at this stage of the CQL parser is >>> expensive or not (in term of computation) >>> >>> Let me try a patch >> >> So is there any update on this? >> >> Thanks, >> Mikhail >> >> >>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy >>> <mikhail.krupits...@jetbrains.com >>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>> >>> Hi! >>> >>> Have you had a chance to try your patch or solve the issue in an other way? >>> >>> Thanks, >>> Mikhail >>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com >>>> <mailto:doanduy...@gmail.com>> wrote: >>>> >>>> Ok so I've found the source of the issue, it's pretty well hidden because >>>> it is NOT in the SASI source code directly. >>>> >>>> Here is the method where C* determines what kind of LIKE expression you're >>>> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES) >>>> >>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 >>>> >>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778> >>>> >>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget >>>> to remove escape character BEFORE doing the matching so if your search is >>>> LIKE '%%esc%', the detected expression is LIKE_CONTAINS. >>>> >>>> A possible fix would be: >>>> >>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on >>>> the column data type) >>>> 2) remove the escape character e.g. before parsing OR use some advanced >>>> regex to exclude the %% from parsing e.g >>>> >>>> Step 2) is dead easy but step 1) is harder because I don't know if >>>> converting the bytebuffer into String at this stage of the CQL parser is >>>> expensive or not (in term of computation) >>>> >>>> Let me try a patch >>>> >>>> >>>> >>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com >>>> <mailto:doanduy...@gmail.com>> wrote: >>>> Ok you're right, I get your point >>>> >>>> LIKE '%%esc%' --> startWith('%esc') >>>> >>>> LIKE 'escape%%' --> = 'escape%' >>>> >>>> What I strongly suspect is that in the source code of SASI, we parse the % >>>> xxx % expression BEFORE applying escape. That will explain the observed >>>> behavior. E.g: >>>> >>>> LIKE '%%esc%' parsed as %xxx% where xxx = %esc >>>> >>>> LIKE 'escape%%' parsed as xxx% where xxx =escape% >>>> >>>> Let me check in the source code and try to reproduce the issue >>>> >>>> >>>> >>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy >>>> <mikhail.krupits...@jetbrains.com >>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>>> Looks like we have different understanding of what results are expected. >>>> I based my understanding on >>>> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html >>>> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html> >>>> According to the doc ‘esc’ is a pattern for exact match and I guess that >>>> there is no semantical difference between two LIKE patterns (both of >>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’. >>>> >>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>>>> containing '%esc' so %escapeme is a possible match and also escape%esc >>>> Why ‘containing’? I expect that it should be ’starting’.. >>>>> >>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>>>> starting with 'escape%' so escape%me is a valid result and also escape%esc >>>> Why ’starting’? I expect that it should be ‘exact matching’. >>>> >>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it >>>> returns nothing (CASSANDRA-12573). >>>> >>>> What I’m missing? >>>> >>>> Thanks, >>>> Mikhail >>>> >>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com >>>>> <mailto:doanduy...@gmail.com>> wrote: >>>>> >>>>> CREATE CUSTOM INDEX ON test.escape(val) USING >>>>> 'org.apache.cassandra.index.sa >>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = >>>>> {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa >>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', >>>>> 'case_sensitive': 'false'}; >>>>> >>>>> I don't see any problem in the results you got >>>>> >>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>>>> containing '%esc' so %escapeme is a possible match and also escape%esc >>>> Why ‘containing’? I expect that it should be ’starting’.. >>>>> >>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>>>> starting with 'escape%' so escape%me is a valid result and also escape%esc >>>> Why ’starting’? I expect that it should be ‘exact matching’. >>>> >>>>> >>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy >>>>> <mikhail.krupits...@jetbrains.com >>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>>>> Thanks for the reply. >>>>> Could you please provide what index definition did you use? >>>>> With the index from my script I get the following results: >>>>> >>>>> cqlsh:test> select * from escape; >>>>> >>>>> id | val >>>>> ----+----------- >>>>> 1 | %escapeme >>>>> 2 | escape%me >>>>> 3 | escape%esc >>>>> >>>>> Contains search >>>>> >>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>>> >>>>> id | val >>>>> ----+----------- >>>>> 1 | %escapeme >>>>> 3 | escape%esc >>>>> (2 rows) >>>>> >>>>> >>>>> Prefix search >>>>> >>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>>> >>>>> id | val >>>>> ----+----------- >>>>> 2 | escape%me >>>>> 3 | escape%esc >>>>> >>>>> Thanks, >>>>> Mikhail >>>>> >>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com >>>>>> <mailto:doanduy...@gmail.com>> wrote: >>>>>> >>>>>> Use % to escape % >>>>>> >>>>>> cqlsh:test> select * from escape; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 1 | %escapeme >>>>>> 2 | escape%me >>>>>> >>>>>> >>>>>> Contains search >>>>>> >>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 1 | %escapeme >>>>>> >>>>>> (1 rows) >>>>>> >>>>>> >>>>>> Prefix search >>>>>> >>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>>>> >>>>>> id | val >>>>>> ----+----------- >>>>>> 2 | escape%me >>>>>> >>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy >>>>>> <mikhail.krupits...@jetbrains.com >>>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>>>>> Hi Cassandra guys, >>>>>> >>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a >>>>>> search pattern. >>>>>> Here is my test script: >>>>>> >>>>>> DROP keyspace if exists kmv; >>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : >>>>>> 'SimpleStrategy', 'replication_factor':'1'} ; >>>>>> USE kmv; >>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY >>>>>> KEY(id, c1)); >>>>>> CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING >>>>>> 'org.apache.cassandra.index.sa >>>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = { >>>>>> 'analyzed' : 'true', >>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa >>>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', >>>>>> 'case_sensitive' : 'false', >>>>>> 'mode' : 'CONTAINS' >>>>>> }; >>>>>> >>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd'); >>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd'); >>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%'); >>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1'); >>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd'); >>>>>> >>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_'; >>>>>> >>>>>> _pattern_ '%%%' finds all columns that contain %. >>>>>> How to find columns that start form ‘%’ or ‘%a’? >>>>>> How to find columns that end with ‘%’? >>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ >>>>>> char form % as a command symbol? (Also there is a related issue >>>>>> CASSANDRA-12573). >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Mikhail >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >> >> > >