Thanks Andy. REUtil.quoteMeta() seems to do the trick.
On Tue, Jan 21, 2014 at 6:22 PM, Andy Seaborne <a...@apache.org> wrote: > On 21/01/14 16:58, Martynas Jusevičius wrote: >> >> Andy, >> >> what if I'm sending the query to a remote endpoint that does not >> support Java style regex syntax? Do I need to use FmtUtils then? > > > FmtUtils does not have code to escape regex metacharacters. > You'll need to escape all metacharacter by string manipulation. It's a > shame that the standard Java RT uses \Q\E > > Perl has quotemeta which implements it's \Q\E > > >> >> Looking at the example from SPARQL 1.1 STR() [1]: >> >> This query selects the set of people who use their work.example >> address in their foaf profile: >> >> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >> SELECT ?name ?mbox >> WHERE { ?x foaf:name ?name ; >> foaf:mbox ?mbox . >> FILTER regex(str(?mbox), "@work\\.example$") } >> >> How does "work.example" become "work\\.example"? Is one backslash >> escaping the regex, and the second one escaping the literal? How do I >> achieve this with Jena? > > > Yes. \\ puts a single real \ into the SPARQL string. > > That's how you do it in Jena. > "." is a metacharacter so it needs escaping. > ==> \. > but it's in a SPARQL string so syntax needs to be \\ > ==> \\. > > > The Xerces implement in REUtil.quoteMeta is: > > public static String quoteMeta(String literal) { > int len = literal.length(); > StringBuffer buffer = null; > for (int i = 0; i < len; i ++) { > int ch = literal.charAt(i); > if (".*+?{[()|\\^$".indexOf(ch) >= 0) { > if (buffer == null) { > buffer = new StringBuffer(i+(len-i)*2); > if (i > 0) buffer.append(literal.substring(0, i)); > } > buffer.append((char)'\\'); > buffer.append((char)ch); > } else if (buffer != null) > buffer.append((char)ch); > } > return buffer != null ? buffer.toString() : literal; > } > > and there are others via Google. > > Andy > > >> >> [1] http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-str >> >> Martynas >> >> On Tue, Jan 21, 2014 at 10:10 AM, Andy Seaborne <a...@apache.org> wrote: >>> >>> Works for me: >>> >>> SELECT * { >>> VALUES ?o { "+35" "abc+35def" } >>> FILTER regex(?o , "\\Q+35\\E", "i") >>> } >>> >>> and in Java you need \\\\ due to both the levels of escaping (Java text, >>> SPARQL). >>> >>> [[ >>> Regex: Pattern exception: java.util.regex.PatternSyntaxException: >>> Dangling >>> meta character '+' near index 0 >>> ]] >>> which gives the game away it's using java regexs :-) >>> >>> The regex engine is Java's and the string is used untouched. >>> >>> Strictly, it should be XSD v1 which are slightly different: >>> >>> http://www.w3.org/TR/xmlschema-2/#regexs >>> >>> and there is a strict Xerces provided alternative if you want exact XSD >>> regular expressions. >>> >>> XSD and Java differs in only very small ways (e.g. XSD has one extra >>> modifier flag, "m", XSD has no \Q\E, and XSD has "Is" for unicode code >>> blocks inside \p e.g. \p{IsMongolian}) >>> >>> And now >>> http://www.w3.org/TR/xmlschema11-2/#regexs >>> >>> Andy >>> >>> >>> On 21/01/14 02:32, Joshua TAYLOR wrote: >>>> >>>> >>>> My apologies. I replied too quickly. I just wrote this test with >>>> Jena's command line tools. To match the string "+35", I had to use the >>>> "\\+35" in the query: >>>> >>>> select ?label where { >>>> values ?label { "+35" "-35" } >>>> filter(regex(str(?label),"\\+35")) >>>> } >>>> >>>> --------- >>>> | label | >>>> ========= >>>> | "+35" | >>>> --------- >>>> >>>> That's _two_ slashes in the query string, which means that in Java >>>> you'd end up writing >>>> >>>> String query = ... + "filter(regex(...,\"\\\\+35\")" + ...; >>>> >>>> Sorry for the hasty and inaccurate reply. >>>> >>>> On Mon, Jan 20, 2014 at 9:27 PM, Joshua TAYLOR <joshuaaa...@gmail.com> >>>> wrote: >>>>> >>>>> >>>>> On Mon, Jan 20, 2014 at 8:48 PM, Martynas Jusevičius >>>>> <marty...@graphity.org> wrote: >>>>>> >>>>>> >>>>>> OK maybe "+35" was a bad example. But isn't "+" a special char in >>>>>> SPARQL regex? And there are more like "*", "?" etc. >>>>>> http://www.w3.org/TR/xpath-functions/#regex-syntax >>>>> >>>>> >>>>> >>>>> >>>>> Oh, good point. But if it needs to be escape with a slash, then >>>>> wouldn't >>>>> >>>>> filter regex(str(?label), "\+35") >>>>> >>>>> be fine? Note that if you're constructing this programmatically, you >>>>> might end up writing code like >>>>> >>>>> String queryString = ... + "filter regex(str(?label), \"\\+35\")" >>>>> + >>>>> ...; >>>>> >>>>> -- >>>>> Joshua Taylor, http://www.cs.rpi.edu/~tayloj/ >>>> >>>> >>>> >>>> >>>> >>> >