Thanks Andy. REUtil.quoteMeta() seems to do the trick.

On Tue, Jan 21, 2014 at 6:22 PM, Andy Seaborne <a...@apache.org> wrote:
> On 21/01/14 16:58, Martynas Jusevičius wrote:
>>
>> Andy,
>>
>> what if I'm sending the query to a remote endpoint that does not
>> support Java style regex syntax? Do I need to use FmtUtils then?
>
>
> FmtUtils does not have code to escape regex metacharacters.
> You'll need to escape all metacharacter by string manipulation.  It's a
> shame that the standard Java RT uses \Q\E
>
> Perl has quotemeta which implements it's \Q\E
>
>
>>
>> Looking at the example from SPARQL 1.1 STR() [1]:
>>
>> This query selects the set of people who use their work.example
>> address in their foaf profile:
>>
>> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>> SELECT ?name ?mbox
>>   WHERE { ?x foaf:name  ?name ;
>>              foaf:mbox  ?mbox .
>>           FILTER regex(str(?mbox), "@work\\.example$") }
>>
>> How does "work.example" become "work\\.example"? Is one backslash
>> escaping the regex, and the second one escaping the literal? How do I
>> achieve this with Jena?
>
>
> Yes. \\ puts a single real \ into the SPARQL string.
>
> That's how you do it in Jena.
> "." is a metacharacter so it needs escaping.
> ==> \.
> but it's in a SPARQL string so syntax needs to be \\
> ==> \\.
>
>
> The Xerces implement in REUtil.quoteMeta is:
>
> public static String quoteMeta(String literal) {
>         int len = literal.length();
>         StringBuffer buffer = null;
>         for (int i = 0;  i < len;  i ++) {
>             int ch = literal.charAt(i);
>             if (".*+?{[()|\\^$".indexOf(ch) >= 0) {
>                 if (buffer == null) {
>                     buffer = new StringBuffer(i+(len-i)*2);
>                     if (i > 0)  buffer.append(literal.substring(0, i));
>                 }
>                 buffer.append((char)'\\');
>                 buffer.append((char)ch);
>             } else if (buffer != null)
>                 buffer.append((char)ch);
>         }
>         return buffer != null ? buffer.toString() : literal;
>     }
>
> and there are others via Google.
>
>         Andy
>
>
>>
>> [1] http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-str
>>
>> Martynas
>>
>> On Tue, Jan 21, 2014 at 10:10 AM, Andy Seaborne <a...@apache.org> wrote:
>>>
>>> Works for me:
>>>
>>> SELECT * {
>>>    VALUES ?o { "+35" "abc+35def" }
>>>    FILTER regex(?o , "\\Q+35\\E", "i")
>>> }
>>>
>>> and in Java you need \\\\ due to both the levels of escaping (Java text,
>>> SPARQL).
>>>
>>> [[
>>> Regex: Pattern exception: java.util.regex.PatternSyntaxException:
>>> Dangling
>>> meta character '+' near index 0
>>> ]]
>>> which gives the game away it's using java regexs :-)
>>>
>>> The regex engine is Java's and the string is used untouched.
>>>
>>> Strictly, it should be XSD v1 which are slightly different:
>>>
>>> http://www.w3.org/TR/xmlschema-2/#regexs
>>>
>>> and there is a strict Xerces provided alternative if you want exact XSD
>>> regular expressions.
>>>
>>> XSD and Java differs in only very small ways (e.g. XSD has one extra
>>> modifier flag, "m", XSD has no \Q\E, and XSD has "Is" for unicode code
>>> blocks inside \p e.g. \p{IsMongolian})
>>>
>>> And now
>>> http://www.w3.org/TR/xmlschema11-2/#regexs
>>>
>>>          Andy
>>>
>>>
>>> On 21/01/14 02:32, Joshua TAYLOR wrote:
>>>>
>>>>
>>>> My apologies.  I replied too quickly.  I just wrote this test with
>>>> Jena's command line tools. To match the string "+35", I had to use the
>>>> "\\+35" in the query:
>>>>
>>>> select ?label where {
>>>>     values ?label { "+35" "-35" }
>>>>     filter(regex(str(?label),"\\+35"))
>>>> }
>>>>
>>>> ---------
>>>> | label |
>>>> =========
>>>> | "+35" |
>>>> ---------
>>>>
>>>> That's _two_ slashes in the query string, which means that in Java
>>>> you'd end up writing
>>>>
>>>> String query = ... + "filter(regex(...,\"\\\\+35\")" + ...;
>>>>
>>>> Sorry for the hasty and inaccurate reply.
>>>>
>>>> On Mon, Jan 20, 2014 at 9:27 PM, Joshua TAYLOR <joshuaaa...@gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Mon, Jan 20, 2014 at 8:48 PM, Martynas Jusevičius
>>>>> <marty...@graphity.org> wrote:
>>>>>>
>>>>>>
>>>>>> OK maybe "+35" was a bad example. But isn't "+" a special char in
>>>>>> SPARQL regex? And there are more like "*", "?" etc.
>>>>>> http://www.w3.org/TR/xpath-functions/#regex-syntax
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Oh, good point.  But if it needs to be escape with a slash, then
>>>>> wouldn't
>>>>>
>>>>>      filter regex(str(?label), "\+35")
>>>>>
>>>>> be fine?  Note that if you're constructing this programmatically, you
>>>>> might end up writing code like
>>>>>
>>>>>       String queryString = ... + "filter regex(str(?label), \"\\+35\")"
>>>>> +
>>>>> ...;
>>>>>
>>>>> --
>>>>> Joshua Taylor, http://www.cs.rpi.edu/~tayloj/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

Reply via email to