Currently SASI can only understand the % for the beginning (suffix) or ending (prefix) position.
Any expression containing the % in the middle like %w%a% will not be interpreter by SASI as wildcard. %w%a% will translate into "Give me all results containing w%a On Thu, Sep 15, 2016 at 3:58 PM, Mikhail Krupitskiy < mikhail.krupits...@jetbrains.com> wrote: > Thank you for the investigation. Will wait for a fix and news. > > Probably it’s not a directly related question but what do you think about > CASSANDRA-12573? Let me know if it’s better to create a separate thread for > it. > > Thanks, > Mikhail > > > On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com> wrote: > > Ok so I've found the source of the issue, it's pretty well hidden because > it is NOT in the SASI source code directly. > > Here is the method where C* determines what kind of LIKE expression you're > using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES) > > https://github.com/apache/cassandra/blob/trunk/src/java/ > org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java# > L733-L778 > > As you can see, it's pretty simple, maybe too simple. Indeed, they forget > to remove escape character BEFORE doing the matching so if your search is LIKE > '%%esc%', the detected expression is LIKE_CONTAINS. > > A possible fix would be: > > 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on > the column data type) > 2) remove the escape character e.g. before parsing OR use some advanced > regex to exclude the %% from parsing e.g > > Step 2) is dead easy but step 1) is harder because I don't know if > converting the bytebuffer into String at this stage of the CQL parser is > expensive or not (in term of computation) > > Let me try a patch > > > > On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> Ok you're right, I get your point >> >> LIKE '%%esc%' --> startWith('%esc') >> >> LIKE 'escape%%' --> = 'escape%' >> >> What I strongly suspect is that in the source code of SASI, we parse the >> % xxx % expression BEFORE applying escape. That will explain the observed >> behavior. E.g: >> >> LIKE '%%esc%' parsed as %xxx% where xxx = %esc >> >> LIKE 'escape%%' parsed as xxx% where xxx =escape% >> >> Let me check in the source code and try to reproduce the issue >> >> >> >> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy < >> mikhail.krupits...@jetbrains.com> wrote: >> >>> Looks like we have different understanding of what results are expected. >>> I based my understanding on http://docs.datastax.com/en >>> /cql/3.3/cql/cql_using/useSASIIndex.html >>> According to the doc ‘esc’ is a pattern for exact match and I guess that >>> there is no semantical difference between two LIKE patterns (both of >>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’. >>> >>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>> *containing* '%esc' so *%esc*apeme is a possible match and also escape >>> *%esc* >>> >>> Why ‘containing’? I expect that it should be ’starting’.. >>> >>> >>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>> *starting* with 'escape%' so *escape%*me is a valid result and also >>> *escape%*esc >>> >>> Why ’starting’? I expect that it should be ‘exact matching’. >>> >>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it >>> returns nothing (CASSANDRA-12573). >>> >>> What I’m missing? >>> >>> Thanks, >>> Mikhail >>> >>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com> wrote: >>> >>> CREATE CUSTOM INDEX ON test.escape(val) USING ' >>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': >>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa >>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'}; >>> >>> I don't see any problem in the results you got >>> >>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>> *containing* '%esc' so *%esc*apeme is a possible match and also escape >>> *%esc* >>> >>> Why ‘containing’? I expect that it should be ’starting’.. >>> >>> >>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>> *starting* with 'escape%' so *escape%*me is a valid result and also >>> *escape%*esc >>> >>> Why ’starting’? I expect that it should be ‘exact matching’. >>> >>> >>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy < >>> mikhail.krupits...@jetbrains.com> wrote: >>> >>>> Thanks for the reply. >>>> Could you please provide what index definition did you use? >>>> With the index from my script I get the following results: >>>> >>>> cqlsh:test> select * from escape; >>>> >>>> id | val >>>> ----+----------- >>>> 1 | %escapeme >>>> 2 | escape%me >>>> * 3 | escape%esc* >>>> >>>> Contains search >>>> >>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>> >>>> id | val >>>> ----+----------- >>>> 1 | %escapeme >>>> 3 >>>> * | escape%esc*(2 rows) >>>> >>>> >>>> Prefix search >>>> >>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>> >>>> id | val >>>> ----+----------- >>>> 2 | escape%me >>>> 3 >>>> * | escape%esc* >>>> >>>> Thanks, >>>> Mikhail >>>> >>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com> wrote: >>>> >>>> Use % to escape % >>>> >>>> cqlsh:test> select * from escape; >>>> >>>> id | val >>>> ----+----------- >>>> 1 | %escapeme >>>> 2 | escape%me >>>> >>>> >>>> Contains search >>>> >>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>> >>>> id | val >>>> ----+----------- >>>> 1 | %escapeme >>>> >>>> (1 rows) >>>> >>>> >>>> Prefix search >>>> >>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>> >>>> id | val >>>> ----+----------- >>>> 2 | escape%me >>>> >>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy < >>>> mikhail.krupits...@jetbrains.com> wrote: >>>> >>>>> Hi Cassandra guys, >>>>> >>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a >>>>> search pattern. >>>>> Here is my test script: >>>>> >>>>> DROP keyspace if exists kmv; >>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : >>>>> 'SimpleStrategy', 'replication_factor':'1'} ; >>>>> USE kmv; >>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY >>>>> KEY(id, c1)); >>>>> CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING ' >>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { >>>>> 'analyzed' : 'true', >>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa >>>>> si.analyzer.NonTokenizingAnalyzer', >>>>> 'case_sensitive' : 'false', >>>>> 'mode' : 'CONTAINS' >>>>> }; >>>>> >>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd'); >>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd'); >>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%'); >>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1'); >>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd'); >>>>> >>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_'; >>>>> >>>>> _pattern_ '%%%' finds all columns that contain %. >>>>> How to find columns that start form ‘%’ or ‘%a’? >>>>> How to find columns that end with ‘%’? >>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate >>>>> ‘%’ char form % as a command symbol? (Also there is a related issue >>>>> CASSANDRA-12573). >>>>> >>>>> >>>>> Thanks, >>>>> Mikhail >>>> >>>> >>>> >>>> >>> >>> >> > >