> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
In the current implementation (‘%’ could be a wildcard only at the start/end of 
a term) I guess it should be ’ENDS with ‘%escape’ ‘.
Moreover all terms that contains single ‘%’ somewhere in the middle should 
cause an exception.
BUT may be it’s better to make escaping more universal to support a future 
possible case where a wildcard could be placed in the middle of a term too?

Thanks,
Mikhail 
> On 24 Sep 2016, at 21:09, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Reminder, right now, the % character is only interpreted as wildcard IF AND 
> ONLY IF it is the first/last character of the searched term
> 
> 
> LIKE '%escape' --> ENDS WITH 'escape' 
> 
> If we use % to escape %,
> LIKE '%%escape' -->  EQUALS TO '%escape'
> 
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
> 
> 
> 
> 
> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> Hi, Jim,
> 
> What pattern should be used to search “ends with  ‘%escape’ “ with your 
> conception?
> 
> Thanks,
> Mikhail
> 
>> On 22 Sep 2016, at 18:51, Jim Ancona <j...@anconafamily.com 
>> <mailto:j...@anconafamily.com>> wrote:
>> 
>> To answer DuyHai's question without introducing new syntax, I'd suggest:
>>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
>> So the first two %'s are translated to a literal, non-wildcard % and the 
>> third % is a wildcard because it's not doubled.
>> 
>> Jim
>> 
>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> I guess that it should be similar to how it is done in SQL for LIKE patterns.
>> 
>> You can introduce an escape character, e.g. ‘\’.
>> Examples:
>> ‘%’ - any string
>> ‘\%’ - equal to ‘%’ character
>> ‘\%foo%’ - starts from ‘%foo’
>> ‘%%%escape’ - ends with ’escape’
>> ‘\%%’ - starts from ‘%’
>> ‘\\\%%’ - starts from ‘\%’ .
>> 
>> What do you think?
>> 
>> Thanks,
>> Mikhail
>>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com 
>>> <mailto:doanduy...@gmail.com>> wrote:
>>> 
>>> Hello Mikhail
>>> 
>>> It's more complicated that it seems
>>> 
>>> LIKE '%%escape' means  EQUAL TO '%escape'
>>> 
>>> LIKE '%escape' means ENDS WITH 'escape'
>>> 
>>> What's about LIKE '%%%escape' ????
>>> 
>>> How should we treat this case ? Replace %% by % at the beginning of the 
>>> searched term ??
>>> 
>>> 
>>> 
>>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy 
>>> <mikhail.krupits...@jetbrains.com 
>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>> Hi!
>>> 
>>> We’ve talked about two items:
>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help 
>>> of LIKE.
>>> 
>>> Item #1was resolved as CASSANDRA-12573.
>>> 
>>> Regarding to item #2: you said the following:
>>>> A possible fix would be:
>>>> 
>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>>> the column data type)
>>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>>> regex to exclude the %% from parsing e.g
>>>> 
>>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>>> expensive or not (in term of computation)
>>>> 
>>>> Let me try a patch 
>>> 
>>> So is there any update on this?
>>> 
>>> Thanks,
>>> Mikhail
>>> 
>>> 
>>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
>>>> <mikhail.krupits...@jetbrains.com 
>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> Have you had a chance to try your patch or solve the issue in an other 
>>>> way? 
>>>> 
>>>> Thanks,
>>>> Mikhail
>>>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com 
>>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>>> 
>>>>> Ok so I've found the source of the issue, it's pretty well hidden because 
>>>>> it is NOT in the SASI source code directly.
>>>>> 
>>>>> Here is the method where C* determines what kind of LIKE expression 
>>>>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>>> 
>>>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>>>>  
>>>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>>>>> 
>>>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget 
>>>>> to remove escape character BEFORE doing the matching so if your search is 
>>>>> LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>>> 
>>>>> A possible fix would be:
>>>>> 
>>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>>>> the column data type)
>>>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>>>> regex to exclude the %% from parsing e.g
>>>>> 
>>>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>>>> expensive or not (in term of computation)
>>>>> 
>>>>> Let me try a patch  
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com 
>>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>>> Ok you're right, I get your point
>>>>> 
>>>>> LIKE '%%esc%' --> startWith('%esc')
>>>>> 
>>>>> LIKE 'escape%%' -->  = 'escape%'
>>>>> 
>>>>> What I strongly suspect is that in the source code of SASI, we parse the 
>>>>> % xxx % expression BEFORE applying escape. That will explain the observed 
>>>>> behavior. E.g:
>>>>> 
>>>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>>> 
>>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>>> 
>>>>> Let me check in the source code and try to reproduce the issue
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
>>>>> <mikhail.krupits...@jetbrains.com 
>>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>>>> Looks like we have different understanding of what results are expected.
>>>>> I based my understanding on 
>>>>> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
>>>>> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>>>>> According to the doc ‘esc’ is a pattern for exact match and I guess that 
>>>>> there is no semantical difference between two LIKE patterns (both of 
>>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>> 
>>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>>>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>> 
>>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>>>>> starting with 'escape%' so escape%me is a valid result and also 
>>>>>> escape%esc
>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>> 
>>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it 
>>>>> returns nothing (CASSANDRA-12573).
>>>>> 
>>>>> What I’m missing?
>>>>> 
>>>>> Thanks,
>>>>> Mikhail
>>>>> 
>>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com 
>>>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>>>> 
>>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING 
>>>>>> 'org.apache.cassandra.index.sa 
>>>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = 
>>>>>> {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>>>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>>>>  'case_sensitive': 'false'};
>>>>>> 
>>>>>> I don't see any problem in the results you got
>>>>>> 
>>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>>>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>> 
>>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>>>>> starting with 'escape%' so escape%me is a valid result and also 
>>>>>> escape%esc
>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>> 
>>>>>> 
>>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
>>>>>> <mikhail.krupits...@jetbrains.com 
>>>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>>>>> Thanks for the reply.
>>>>>> Could you please provide what index definition did you use?
>>>>>> With the index from my script I get the following results:
>>>>>> 
>>>>>> cqlsh:test> select * from escape;
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   2 | escape%me
>>>>>>   3 | escape%esc
>>>>>> 
>>>>>> Contains search
>>>>>> 
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   3 | escape%esc
>>>>>> (2 rows)
>>>>>> 
>>>>>> 
>>>>>> Prefix search
>>>>>> 
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   2 | escape%me
>>>>>>   3 | escape%esc
>>>>>> 
>>>>>> Thanks,
>>>>>> Mikhail 
>>>>>> 
>>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com 
>>>>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> Use % to escape %
>>>>>>> 
>>>>>>> cqlsh:test> select * from escape;
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>>   2 | escape%me
>>>>>>> 
>>>>>>> 
>>>>>>> Contains search
>>>>>>> 
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>> 
>>>>>>> (1 rows)
>>>>>>> 
>>>>>>> 
>>>>>>> Prefix search
>>>>>>> 
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   2 | escape%me
>>>>>>> 
>>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy 
>>>>>>> <mikhail.krupits...@jetbrains.com 
>>>>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>>>>>> Hi Cassandra guys,
>>>>>>> 
>>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a 
>>>>>>> search pattern.
>>>>>>> Here is my test script:
>>>>>>> 
>>>>>>> DROP keyspace if exists kmv;
>>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
>>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>>> USE kmv;
>>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY 
>>>>>>> KEY(id, c1));
>>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
>>>>>>> 'org.apache.cassandra.index.sa 
>>>>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>>>>>> 'analyzed' : 'true',
>>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa 
>>>>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>>>>> 'case_sensitive' : 'false',
>>>>>>> 'mode' : 'CONTAINS'
>>>>>>> };
>>>>>>> 
>>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>> 
>>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>> 
>>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>>> How to find columns that end with ‘%’?
>>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate 
>>>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue 
>>>>>>> CASSANDRA-12573).
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikhail
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

Reply via email to