I guess that it should be similar to how it is done in SQL for LIKE patterns.

You can introduce an escape character, e.g. ‘\’.
Examples:
‘%’ - any string
‘\%’ - equal to ‘%’ character
‘\%foo%’ - starts from ‘%foo’
‘%%%escape’ - ends with ’escape’
‘\%%’ - starts from ‘%’
‘\\\%%’ - starts from ‘\%’ .

What do you think?

Thanks,
Mikhail
> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Hello Mikhail
> 
> It's more complicated that it seems
> 
> LIKE '%%escape' means  EQUAL TO '%escape'
> 
> LIKE '%escape' means ENDS WITH 'escape'
> 
> What's about LIKE '%%%escape' ????
> 
> How should we treat this case ? Replace %% by % at the beginning of the 
> searched term ??
> 
> 
> 
> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> Hi!
> 
> We’ve talked about two items:
> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help 
> of LIKE.
> 
> Item #1was resolved as CASSANDRA-12573.
> 
> Regarding to item #2: you said the following:
>> A possible fix would be:
>> 
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
>> column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced 
>> regex to exclude the %% from parsing e.g
>> 
>> Step 2) is dead easy but step 1) is harder because I don't know if 
>> converting the bytebuffer into String at this stage of the CQL parser is 
>> expensive or not (in term of computation)
>> 
>> Let me try a patch 
> 
> So is there any update on this?
> 
> Thanks,
> Mikhail
> 
> 
>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> 
>> Hi!
>> 
>> Have you had a chance to try your patch or solve the issue in an other way? 
>> 
>> Thanks,
>> Mikhail
>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com 
>>> <mailto:doanduy...@gmail.com>> wrote:
>>> 
>>> Ok so I've found the source of the issue, it's pretty well hidden because 
>>> it is NOT in the SASI source code directly.
>>> 
>>> Here is the method where C* determines what kind of LIKE expression you're 
>>> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>> 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>>  
>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>>> 
>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget 
>>> to remove escape character BEFORE doing the matching so if your search is 
>>> LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>> 
>>> A possible fix would be:
>>> 
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>> regex to exclude the %% from parsing e.g
>>> 
>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>> expensive or not (in term of computation)
>>> 
>>> Let me try a patch  
>>> 
>>> 
>>> 
>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com 
>>> <mailto:doanduy...@gmail.com>> wrote:
>>> Ok you're right, I get your point
>>> 
>>> LIKE '%%esc%' --> startWith('%esc')
>>> 
>>> LIKE 'escape%%' -->  = 'escape%'
>>> 
>>> What I strongly suspect is that in the source code of SASI, we parse the % 
>>> xxx % expression BEFORE applying escape. That will explain the observed 
>>> behavior. E.g:
>>> 
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>> 
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>> 
>>> Let me check in the source code and try to reproduce the issue
>>> 
>>> 
>>> 
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
>>> <mikhail.krupits...@jetbrains.com 
>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>> Looks like we have different understanding of what results are expected.
>>> I based my understanding on 
>>> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
>>> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>>> According to the doc ‘esc’ is a pattern for exact match and I guess that 
>>> there is no semantical difference between two LIKE patterns (both of 
>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>> 
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>> 
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>>> starting with 'escape%' so escape%me is a valid result and also escape%esc
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>> 
>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it 
>>> returns nothing (CASSANDRA-12573).
>>> 
>>> What I’m missing?
>>> 
>>> Thanks,
>>> Mikhail
>>> 
>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com 
>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>> 
>>>> CREATE CUSTOM INDEX ON test.escape(val) USING 
>>>> 'org.apache.cassandra.index.sa 
>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = 
>>>> {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 
>>>> 'case_sensitive': 'false'};
>>>> 
>>>> I don't see any problem in the results you got
>>>> 
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>> 
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>>> starting with 'escape%' so escape%me is a valid result and also escape%esc
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>> 
>>>> 
>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
>>>> <mikhail.krupits...@jetbrains.com 
>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>>> Thanks for the reply.
>>>> Could you please provide what index definition did you use?
>>>> With the index from my script I get the following results:
>>>> 
>>>> cqlsh:test> select * from escape;
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   2 | escape%me
>>>>   3 | escape%esc
>>>> 
>>>> Contains search
>>>> 
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   3 | escape%esc
>>>> (2 rows)
>>>> 
>>>> 
>>>> Prefix search
>>>> 
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   2 | escape%me
>>>>   3 | escape%esc
>>>> 
>>>> Thanks,
>>>> Mikhail 
>>>> 
>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com 
>>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>>> 
>>>>> Use % to escape %
>>>>> 
>>>>> cqlsh:test> select * from escape;
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>> 
>>>>> 
>>>>> Contains search
>>>>> 
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>> 
>>>>> (1 rows)
>>>>> 
>>>>> 
>>>>> Prefix search
>>>>> 
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>> 
>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy 
>>>>> <mikhail.krupits...@jetbrains.com 
>>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>>>> Hi Cassandra guys,
>>>>> 
>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a 
>>>>> search pattern.
>>>>> Here is my test script:
>>>>> 
>>>>> DROP keyspace if exists kmv;
>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>> USE kmv;
>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
>>>>> c1));
>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
>>>>> 'org.apache.cassandra.index.sa 
>>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>>>> 'analyzed' : 'true',
>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa 
>>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>>> 'case_sensitive' : 'false',
>>>>> 'mode' : 'CONTAINS'
>>>>> };
>>>>> 
>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>> 
>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>> 
>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>> How to find columns that end with ‘%’?
>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ 
>>>>> char form % as a command symbol? (Also there is a related issue 
>>>>> CASSANDRA-12573).
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Mikhail
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
> 
> 

Reply via email to