Or maybe even more simple

|BIND(REPLACE(STR(?url),".*/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/.*","$1")
AS ?email)|

>> BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
> replaces the matching email address by the email address itself, so it's
> the same as before.
>
> You need to replace everything else by the email address, replace is not
> an "extract" function, you can try
>
> BIND
> (REPLACE(STR(?url),"[a-zA-Z0-9/:._-]+/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/[a-zA-Z0-9/._-]+","$1")
> AS ?email)
>
> Note, I assume that email addresses are wrapped inside / char
>
>
>> very good Richard, thank you. I was working along these lines with the 
>> following
>>
>> BIND (REPLACE(STR(?url),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
>>
>> where ?url contains the match but binds the entire string again to ?email
>>
>> eg data:
>>
>> url = 
>> http://www.imagesnippets.com/imgtag/rdf/[email protected]/1598550_10204479279247862_1280347905880818932_o
>>
>> query
>>
>> SELECT ?email
>> WHERE  {
>> ?s ?p ?o
>> BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
>> }
>>
>> On Tue, Apr 23, 2019 at 6:00 PM Richard Cyganiak <[email protected]> wrote:
>>> Hi Marco,
>>>
>>>> On 23 Apr 2019, at 15:53, Marco Neumann <[email protected]> wrote:
>>>>
>>>> I think I'm familiar with functions on strings in SPARQL but as far as
>>>> I can see there is nothing similar to a grep like pattern matching and
>>>> extraction on strings for SPARQL. Or is there one?
>>> The replace function does pattern matching and allows extraction of matched 
>>> sub-patterns:
>>> https://www.w3.org/TR/sparql11-query/#func-replace 
>>> <https://www.w3.org/TR/sparql11-query/#func-replace>
>>> https://www.w3.org/TR/xpath-functions/#func-replace 
>>> <https://www.w3.org/TR/xpath-functions/#func-replace>
>>>
>>>     replace(input, pattern, replacement)
>>>
>>> The special “variables” $1, $2, $3, and so on can be used in the 
>>> replacement string. They refer to parts of the input that were matched by 
>>> the first, second, third, and so on pair of parentheses in the regex 
>>> pattern. For example:
>>>
>>>     replace("23 April 2019", "^([0-9][0-9])", "$1")
>>>
>>> would return "23" because that is the part of the input matched by the 
>>> first (and only) pair of parentheses.
>>>
>>> Also useful might be Jena’s own apf:strSplit property function:
>>> https://jena.apache.org/documentation/query/library-propfunc.html 
>>> <https://jena.apache.org/documentation/query/library-propfunc.html>
>>>
>>> It can split a literal into multiple literals based on a regular expression.
>>>
>>> Taken together, these two functions can do a wide range of pattern matching 
>>> and extraction tasks.
>>>
>>> Hope that helps,
>>> Richard
>>
-- 
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center

Reply via email to