On Fri, 2010-07-23 at 11:44 +0100, Vanessa Lopez wrote:
> Hello,
> 
> I am trying to optimize time performance as much as possible on my  
> full text index queries. To query for classes in DBpedia that contains  
> "person" in the label, I send the query
> 
> SELECT DISTINCT ?s ?o FROM <http://dbpedia.org> WHERE {{?s rdfs:label
>   ?o.[] a ?s .FILTER( bif:contains(?o, "person" ) )}}LIMIT 15
> 
> or also (if I want to check dbprop:name):
> 
> SELECT DISTINCT ?s ?o FROM <http://dbpedia.org> WHERE {{?s rdfs:label ? 
> o.[] a ?s .FILTER( bif:contains(?o, "person" ) ) } UNION { ?s   
> dbpprop:name
>   ?o.[] a ?s .FILTER( bif:contains(?o, "person" ) )}}LIMIT 10

The performance of bif:contains is "self-protected", I'd say. When the
optimizer unable to find a good join with appropriate variable it
reports some error. Both these queries are OK.

> Can I optimize this query in any way? Does it make any different if I  
> put the bif:contains out of the FILTER, e.g:
> 
> SELECT DISTINCT ?s ?o FROM <http://dbpedia.org> WHERE {{?s rdfs:label
>   ?o.[] a ?s. ?o bif:contains "person"}}LIMIT 15

No difference, bif:contains as a "magic predicate" is no more than
syntax sugar. Both "filter" version and a "magic predicate" are boiled
down to a join between table with variable in object position and a
free-text for object column.

Moreover, the constant graph http://dbpedia.org adds its own
optimization effect to the text search. For each graph, a special "graph
keyword" is created and every object used in graph is indexed in such a
way that it seems that the graph keyword is a part of the object. So the
actual search is for "graph keyword for http://dbpedia.org"; AND
"person", this does not matter if almost all data are in one graph but
helps in other cases.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com



Reply via email to