Re: About SPARQL predicates as variables...

Mark Feblowitz Mon, 24 Aug 2015 18:45:07 -0700

It occurred to me that I had previously tested a related (sub)query and it 
seems very simple and quick, looking for predicates for a given entity:


PREFIX : <http://dbpedia.org/resource/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT  (COUNT(DISTINCT ?predicate) as ?PC)
  WHERE {
    VALUES ?Entity { :LeBron_James }
    {SERVICE SILENT <http://dbpedia-live.openlinksw.com/sparql?timeout=4000> 
        {SELECT DISTINCT * where {
            ?Entity ?predicate ?Object;
            }
        }
    }}

The idea is to pull out the predicates first and apply them after.

The “VALUES” clause is a shorthand for a local graph clause. I will later use 
something similar to retrieve N ?Entity bindings. The point here is that I will 
have N entities, each with different predicates, and I want to explore the 
relationships for each of the entities.

Note that the binding occurs outside of the service.

A little closer to what I need, another query uses a second source, from 
plain-old dpedia:

SELECT DISTINCT * WHERE {
    VALUES ?Entity { :LeBron_James }
    {SELECT DISTINCT ?Entity ?predicate
      WHERE {
        {SERVICE SILENT <http://dbpedia-live.openlinksw.com/sparql?timeout=4000>
            {SELECT DISTINCT ?Entity ?predicate where {
                ?Entity ?predicate ?Object;
                }
            }
        }
        UNION
        {SERVICE SILENT <http://dbpedia.openlinksw.com/sparql?timeout=4000> 
            {SELECT DISTINCT  ?Entity ?predicate where {
                ?Entity ?predicate ?Object;
                }
            }
        }
        
}}}

That works pretty rapidly, too.

Finally, and given Andy’s comment about bottom-up processing, I was able to 
write the single endpoint case that works pretty well. It puts the query 
refinements at the top and the predicate generators in a nested query:

    SERVICE <http://dbpedia-live.openlinksw.com/sparql?timeout=4000>  {
        {SELECT DISTINCT ?Entity ?predicate ?A ?Person2 where {
            ?Entity ?predicate ?A.
            FILTER ( isURI(?A) )
            FILTER(!STRSTARTS(STR(?A), "http://dbpedia.org/resource/Template:";))
            FILTER(!STRSTARTS(STR(?A), "http://dbpedia.org/ontology/wiki";))
            FILTER (?A != <http://dbpedia.org/resource/Category:Living_people>)
            FILTER (?A != <http://dbpedia.org/property/wordnet_type>)
            FILTER (?A != <http://www.w3.org/2002/07/owl#Thing>)
            ?Person2 ?predicate ?A.
            FILTER ( isURI(?Person2))
            ?Person2 a do:Person.
            {SELECT  ?Entity ?predicate WHERE {
                VALUES ?Entity { :LeBron_James }
                    {SELECT DISTINCT * where {
                        ?Entity ?predicate ?Object.
                        FILTER (?predicate != rdfs99:type)
                    }
                }
            }}
        }}
    }

It needs a bit of tuning, but it’s more responsive than I was expecting. 

Note that I’m filtering out some things that I don’t think are helpful. I would 
have liked to have used something like a VALUES statement to collapse down the 
?A != filters into a “blacklist” but VALUES with negated filters seem only to 
work as a whitelist).

Now. on to finishing off the query: pulling out the VALUES clause and replacing 
it with a local GRAPH query outside of the SERVICE, and then on to replicating 
the query above and UNIONing the two patterns… without breaking the whole lot. 

All in all, the trickiest query I’ve ever crafted (so far).

Thanks to all for your suggestions.

Mark


> On Aug 22, 2015, at 1:27 PM, Andy Seaborne <a...@apache.org> wrote:
> 
> On 22/08/15 15:51, Mark Feblowitz wrote:
>> Andy -
>> 
>> I did  try that in isolation, and also directly (not within a SERVICE block) 
>> and also directly at the dbpedia sites. Neither worked.
>> 
>> I do see that this form is expensive and have tried it with a number of 
>> filters. I sent the very simplest to focus on the main question.
>> 
> 
> If it's the retrieval costs of the query, filters don't help much. Only the 
> simple filters like FILTER (?x = <y>) can be used to making index scanning 
> more focused.
> 
> As an alternative to BIND, you may find
> 
> SELECT DISTINCT * where {
>     ?Player a do:BasketballPlayer.
>     ?Player ?r ?A.
>     ?Player2 ?r ?A
>     FILTER(?Player = <someURI>)
> }
> 
> helps.  This is optimizable (ARQ does it!) to a BIND-like form
> 
> 
> SELECT DISTINCT * where {
>     <someURI> a do:BasketballPlayer.
>     <someURI> ?r ?A.
>     ?Player2 ?r ?A
>     BIND (?Player AS <someURI>)
> }
> 
> now, the optimizer has a chance, not guaranteed though.  An index join to 
> handle "?Player2 ?r ?A" means that it's a few probes (the number of 
> properties for subject <someURI>).  A hash join without conditions however is 
> still very costly for that step.
> 
> It's all down to the details of the version of Virtuoso at DBpedia. There is 
> an argument that this style of query is "unusual" - optimization is about 
> doing things for the likely cases.
> 
>       Andy
> 
>> Thanks,
>> 
>> Mark
>> 
>>> On Aug 22, 2015, at 5:18 AM, Andy Seaborne <a...@apache.org> wrote:
>>> 
>>> On 22/08/15 06:37, Nauman Ramzan wrote:
>>>> Hi Mark
>>>> Did you connect virtuoso with fuseki or you just import data into fuseki ?
>>>> 
>>>> 
>>>>> On Aug 22, 2015, at 2:56 AM, Mark Feblowitz <markfeblow...@comcast.net> 
>>>>> wrote:
>>>>> 
>>>>> This seems like it should be a FAQ, but I’m not finding anything useful.
>>>>> 
>>>>> I’m using SPARQL to explore linked data, which includes discovering 
>>>>> predicates. My understanding was that I could bind a subject and use 
>>>>> variables for predicate and object.
>>>>> 
>>>>> It’s important to note that what I’m experiencing is when using dbpedia, 
>>>>> which is built on Virtuoso - perhaps my question should be posted 
>>>>> elsewhere? But I am doing this via a Fuseki endpoint, so there’s at least 
>>>>> some relevance :-?
>>> 
>>> Yes - this is possible see below.
>>> 
>>> Fuseki is just passing on the query and is not responsible for the results. 
>>>  "common carrier" :-)
>>> 
>>>>> 
>>>>> Here’s what I’m trying to do:
>>>>> 
>>>>> I have some queries to a Fuseki endpoint that call out to dbpedia using 
>>>>> SERVICE blocks. Before entering the service I bind a value for ?Player, 
>>>>> and then with the service I pose the query;
>>>>> 
>>>>>> PREFIX do: <http://dbpedia.org/ontology/>
>>>>> ...
>>>>>> {SELECT DISTINCT * where {
>>>>>>      ?Player a do:BasketballPlayer.
>>>>>>      ?Player ?r ?A.
>>>>>>      ?Player2 ?r ?A }
>>> 
>>> Mark - did you test just
>>> 
>>> SELECT DISTINCT * where {
>>>  ?Player a do:BasketballPlayer.
>>>  ?Player ?r ?A.
>>>  ?Player2 ?r ?A }
>>> 
>>> in isolation (SERVICE call?) or as a nested SELECT in a larger query?
>>> 
>>> It is possible, especially in the sub-query form, that the query plan is 
>>> expensive because property variables are uncommon (but correct) so less 
>>> work goes on into optimizing such plans.
>>> 
>>> Also an internal timeout executing that part of the query would explain 
>>> what you are seeing.
>>> 
>>> There are going to be a  lot of answers - 14,818,205,957? - counting 
>>> non-DISTINCT.  The DISTINCT version may be significantly more expensive
>>> but
>>> 
>>> SELECT (count(DISTINCT *) AS ?c) {
>>> 
>>> is a syntax error for DBpedia though it is legal SPARQL.
>>> 
>>>     Andy
>>> 
>>>>> 
>>>>> I expected to find a binding for ?r as do:team. And when I pre-bind ?r to 
>>>>> that, I see plenty of bindings for ?A and ?Player2. The one binding that 
>>>>> I do see when leaving ?r unbound is
>>>>> 
>>>>>> http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> 
>>>>> (I notice that dbpedia virtuso is limited - it cannot handle a BIND 
>>>>> statement.)
>>>>> 
>>>>> Is the case am I seeing universal behavior or behavior specific to 
>>>>> virtuoso? Either way, are there clever workarounds?
>>>>> 
>>>>> I do know that I can “white-list in” some bindings for ?r, using a VALUES 
>>>>> statement. But that pretty much defeats the discovery purpose. Also, as 
>>>>> this is varied and often schema-less content, I can’t rely on an ontology 
>>>>> as a guide to defined predicates.
>>>>> 
>>>>> Any suggestions?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Mark
>>>>> 
>>> 
>> 
>

Re: About SPARQL predicates as variables...

Reply via email to