Or maybe even more simple |BIND(REPLACE(STR(?url),".*/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/.*","$1") AS ?email)|
>> BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email) > replaces the matching email address by the email address itself, so it's > the same as before. > > You need to replace everything else by the email address, replace is not > an "extract" function, you can try > > BIND > (REPLACE(STR(?url),"[a-zA-Z0-9/:._-]+/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/[a-zA-Z0-9/._-]+","$1") > AS ?email) > > Note, I assume that email addresses are wrapped inside / char > > >> very good Richard, thank you. I was working along these lines with the >> following >> >> BIND (REPLACE(STR(?url),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email) >> >> where ?url contains the match but binds the entire string again to ?email >> >> eg data: >> >> url = >> http://www.imagesnippets.com/imgtag/rdf/[email protected]/1598550_10204479279247862_1280347905880818932_o >> >> query >> >> SELECT ?email >> WHERE { >> ?s ?p ?o >> BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email) >> } >> >> On Tue, Apr 23, 2019 at 6:00 PM Richard Cyganiak <[email protected]> wrote: >>> Hi Marco, >>> >>>> On 23 Apr 2019, at 15:53, Marco Neumann <[email protected]> wrote: >>>> >>>> I think I'm familiar with functions on strings in SPARQL but as far as >>>> I can see there is nothing similar to a grep like pattern matching and >>>> extraction on strings for SPARQL. Or is there one? >>> The replace function does pattern matching and allows extraction of matched >>> sub-patterns: >>> https://www.w3.org/TR/sparql11-query/#func-replace >>> <https://www.w3.org/TR/sparql11-query/#func-replace> >>> https://www.w3.org/TR/xpath-functions/#func-replace >>> <https://www.w3.org/TR/xpath-functions/#func-replace> >>> >>> replace(input, pattern, replacement) >>> >>> The special “variables” $1, $2, $3, and so on can be used in the >>> replacement string. They refer to parts of the input that were matched by >>> the first, second, third, and so on pair of parentheses in the regex >>> pattern. For example: >>> >>> replace("23 April 2019", "^([0-9][0-9])", "$1") >>> >>> would return "23" because that is the part of the input matched by the >>> first (and only) pair of parentheses. >>> >>> Also useful might be Jena’s own apf:strSplit property function: >>> https://jena.apache.org/documentation/query/library-propfunc.html >>> <https://jena.apache.org/documentation/query/library-propfunc.html> >>> >>> It can split a literal into multiple literals based on a regular expression. >>> >>> Taken together, these two functions can do a wide range of pattern matching >>> and extraction tasks. >>> >>> Hope that helps, >>> Richard >> -- Lorenz Bühmann AKSW group, University of Leipzig Group: http://aksw.org - semantic web research center
