Re: [Virtuoso-users] DISTINCT issue

Ivan Mikhailov Wed, 22 Oct 2008 04:06:34 +0000

Hello Sergio,

I'd add a newline to read the query as the parser reads it:

SELECT DISTINCT
(?name) ?person ?mail
WHERE {
  ?person rdf:type foaf:Person .
  ?person foaf:name ?name .
  ?person foaf:mbox_sha1sum ?mail
}

SELECT DISTINCT is as in plain SPARQL, return values are expression ?name and 
two variables ?person and ?mail.
Unfortunately, there's no such thing as DISTINCT for a part of the result set 
row.

If there's a regular need in such feature and duplicates are seldom then we may 
introduce an aggregate called, say, SPECIMEN, that will return the very first 
of aggregated values, unchanged. Using that aggregate, one could write

SELECT ?name (specimen(?person)) (specimen(?mail))
WHERE {
  ?person rdf:type foaf:Person .
  ?person foaf:name ?name .
  ?person foaf:mbox_sha1sum ?mail
}

If duplicates are seldom and there's no requirement that ?person should 
correspond to ?mail (so the result should contains some person node and some 
mail but they don't have to be connected by foaf:mbox_sha1sum, then MIN 
aggregate will work:

SELECT ?name (min(?person)) (min(?mail))
WHERE {
  ?person rdf:type foaf:Person .
  ?person foaf:name ?name .
  ?person foaf:mbox_sha1sum ?mail
}

Otherwise, a complicated query is needed:

SELECT
?name
((select (min (?person3))
    where {
        ?person3 rdf:type foaf:Person .
        ?person3 foaf:name ?name .
        ?person3 foaf:mbox_sha1sum ?mail } )) as ?person
?mail
WHERE {
    { select distinct ?name
      where {
          ?person1 rdf:type foaf:Person .
          ?person1 foaf:name ?name .
          ?person1 foaf:mbox_sha1sum ?mail1 } }
    { select ?name (min(?mail2)) as ?mail
      where {
          ?person2 rdf:type foaf:Person .
          ?person2 foaf:name ?name .
          ?person2 foaf:mbox_sha1sum ?mail2 } }
}

Best Regards,

Ivan Mikhailov,
OpenLink Software
http://virtuoso.openlinksw.com

On Tue, 2008-10-21 at 09:20 +0200, Sergio Fernández wrote:
> Hi,
> 
> in a experiment I've a dataset with many people (some of then
> duplicated). I want just to list it, I'm doing something like:
> 
> SELECT DISTINCT(?name) ?person ?mail
> WHERE {
>   ?person rdf:type foaf:Person .
>   ?person foaf:name ?name .
>   ?person foaf:mbox_sha1sum ?mail
> }
> 
> But DISTINCT doesn't work as I expected: if there are duplicated names,
> it returns all. I know that DISTINCT() is not an standard syntax
> according to [1]; I've read some example that use it over Virtuoso, but
> I'm not sure about its behaviour.
> 
> Thank you in advance.
> 
> [1] http://www.w3.org/TR/rdf-sparql-query/#modDistinct
>

Re: [Virtuoso-users] DISTINCT issue

Reply via email to