Re: [Virtuoso-users] bif:contains - using a string variable as search term

Hugh Williams Mon, 13 Jun 2011 10:02:19 +0000

Hi Robert,

Development suggest the query:


sparql
PREFIX dc: <http://purl.org/dc/elements/1.1/> 
select distinct ?title ?u
from <http://dbpedia.org>
WHERE
{
?prog dc:title ?title .
?u rdfs:label ?label .
FILTER (bif:isnotnull (bif:strstr (?label, ?title)))
}
ORDER BY DESC ((select ?created where { ?prog dc:created ?created. } ))
LIMIT     1

should be the fastest. Note FROM <graph> and an implicit hint to the
optimizer that ?created can be calculated later and does not affect
filtering (i.e. the presence of ?created is not essential).

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 12 Jun 2011, at 15:28, Robert Globisch wrote:

> Hi Hugh,
> 
> that's me, yes. Hello :)
> 
> When i remove the dc:created property (bounded to my ?prog variable) it gets 
> a lot faster on my Thinkpad (TP) and QuadCore system.
> I need the dc:created property to order the results based on their date of 
> creation (time you tuned in to a channel) of my files loaded into the store.
> As you can see it improves execution time massively.
> 
> I run the explain function for the following query using the virtuoso.db 
> loaded with the whole en.dbpedia dataset.
> Hope that's what you wanted to have.
> 
> 
> ************************************************************************************
> ************************************************************************************
> PREFIX po: <http://purl.org/ontology/po/>
> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
> PREFIX dc: <http://purl.org/dc/elements/1.1/> 
> PREFIX dbpprop: <http://dbpedia.org/property/>
> PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> PREFIX dcterms: <http://purl.org/dc/terms/>
> 
> select distinct ?title ?u
> WHERE
> {
> ?prog dc:title ?title;
> dc:created ?created.
> 
> 
> ?u rdfs:label ?label.
> 
> FILTER (bif:isnotnull (bif:strstr (?label, ?title)))
> 
> 
> }
> ORDER BY DESC (?created)
> LIMIT     1
> 
> 
> 
> ************************************************************************************
> ************************************************************************************
> 
> 
> Result:
> 
> REPORT
> VARCHAR
> _______________________________________________________________________________
> 
> {
> Subquery 21
> {
> Fork 50
> {
> END Node
> 
> After test:
>       0: if ( 0  1(=)  1 ) then 10 else 3 unkn 10
>       3: if ( 0  1(=)  1 ) then 10 else 6 unkn 10
>       6: if ( 0  1(=)  1 ) then 10 else 9 unkn 10
>       9: BReturn 1
>       10: BReturn 0
> from DB.DBA.RDF_QUAD by RDF_QUAD_POGS   1.1e+002 rows
> Key RDF_QUAD_POGS  ASC ($27 "s-13-1-t0.O", $26 "s-13-1-t0.S")
>  inlined <col=554 P =  #dc/elements/1.1/title >
>  Local Test
>       0: if ( 0  1(=)  1 ) then 4 else 3 unkn 4
>       3: BReturn 1
>       4: BReturn 0
> 
> 
> Precode:
>       0: $30 "__ro2sq" := Call __ro2sq ($27 "s-13-1-t0.O")
>       5: BReturn 0
> from DB.DBA.RDF_QUAD by RDF_QUAD       0.23 rows
> Key RDF_QUAD  ASC ($32 "s-13-1-t1.O")
>  inlined <col=554 P =  #dc/elements/1.1/created > , <col=553 S = $26 
> "s-13-1-t0.S">
> 
> from DB.DBA.RDF_QUAD by RDF_QUAD   9.6e+006 rows
> Key RDF_QUAD  ASC ($37 "s-13-1-t2.O", $36 "s-13-1-t2.S")
>  inlined <col=554 P =  #label >
> 
> 
> After test:
>       0: $40 "__ro2sq" := Call __ro2sq ($37 "s-13-1-t2.O")
>       5: $41 "strstr" := Call strstr ($40 "__ro2sq", $30 "__ro2sq")
>       10: $42 "isnotnull" := Call isnotnull ($41 "strstr")
>       15: if ( 0  1(=) $42 "isnotnull") then 19 else 18 unkn 19
>       18: BReturn 1
>       19: BReturn 0
> 
> After code:
>       0: $43 "__id2i" := Call __id2i ($36 "s-13-1-t2.S")
>       5: BReturn 0
> Distinct (HASH) ($27 "s-13-1-t0.O", $36 "s-13-1-t2.S")
> 
> Precode:
>       0: $49 "__ro2sq" := Call __ro2sq ($32 "s-13-1-t1.O")
>       5: BReturn 0
> Sort (HASH) (TOP  1  ) ($49 "__ro2sq") -> ($30 "__ro2sq", $43 "__id2i")
> 
> }
> top order by node
> 
> After code:
>       0: $22 "title" :=  := artm $30 "__ro2sq"
>       4: $23 "u" :=  := artm $43 "__id2i"
>       8: BReturn 0
> Subquery Select($22 "title", $23 "u", <$39 "<DB.DBA.RDF_QUAD s-13-1-t2>" spec 
> 5>, <$34 "<DB.DBA.RDF_QUAD s-13-1-t1>" spec 5>, <$29 "<DB.DBA.RDF_QUAD 
> s-13-1-t0>" spec 5>)
> }
> 
> 
> After code:
>       0: $70 "title" := Call __ro2sq ($22 "title")
>       5: $71 "u" := Call __ro2sq ($23 "u")
>       10: BReturn 0
> Select ($70 "title", $71 "u")
> }
> 
> 69 Rows. -- 328 msec.
> 
> ************************************************************************************
> ************************************************************************************
> 
> 
> Best regards,
> 
> Robert 
> 
> 
> 
> On 12.06.2011 15:39, Hugh Williams wrote:
>> 
>> Hi Robert,
>> 
>> I presume you are also "Robbet <rob...@gmx.de>” who posted similar questions 
>> on the vos mailing list ?
>> 
>> Can you use the Virtuoso explain function to generate a compiler query 
>> execution plan so we can so how this is being constructed as detailed at:
>> 
>>  http://docs.openlinksw.com/virtuoso/fn_explain.html
>> 
>> 
>>         
>> It is also
>>           not clear to me what the figures you state in the following
>>           mean:
>> 
>> 
>>         
>> 
>>           
>>> 
>>>> As soon as i remove the dc:created property query gets about 10-100x faster
>>>> (TP: from 3,5mins > 30s / Quad core: 7 mins  > 5,5mins).
>> 
>>           
>> 
>>         
>> What
>>           is TP and what are the timing difference with and without the
>>           dc:created property ?
>> 
>> 
>>         
>> Best Regards
>> Hugh Williams
>> Professional Services
>> OpenLink Software
>> Web: http://www.openlinksw.com
>> Support: http://support.openlinksw.com
>> Forums: http://boards.openlinksw.com/support
>> Twitter: http://twitter.com/OpenLink
>> 
>> On 12 Jun 2011, at 12:43, Kingsley Idehen wrote:
>> 
>>> On 6/12/11 1:22 AM, Robert Globisch wrote:
>>>> 
>>>> Hello Kingsley,
>>>> 
>>>> i will need your help once again. Actually i'm a bit frustrated :/
>>>> 
>>>> During the last few hours i made some test examples to find out how my 
>>>> query performs:
>>>> 
>>>> First i created a new virtuoso.db with the labels_en.nt dbpedia dataset 
>>>> only (virtuoso.db size about 2.6GB).
>>>> I added some of my own triples. Only a few with some dc: and po 
>>>> properties. (see attachment - example file).
>>>> 
>>>> Afterwards i ran the following query with free text searc index disabled / 
>>>> enabled to get matching title strings within dbpedia:
>>>> 
>>>> SELECT distinct ?title ?label
>>>> 
>>>> WHERE 
>>>> {
>>>> 
>>>> ?prog dc:title ?title;
>>>> dc:created ?created.
>>>> 
>>>> ?dbpedia rdfs:label ?label
>>>> 
>>>> FILTER (bif:isnotnull (bif:strstr (?label, ?title)))
>>>> 
>>>> }
>>>> LIMIT 1
>>>> 
>>>> 
>>>> Execution time on an Intel QuadCore system with 4gb of ram (as already 
>>>> discussed) was about 7 minutes (with free text enabled / disabled).
>>>> I performed same query on the whole de.dbpedia data set (separate 
>>>> virtuoso.db - size about 8,5 GB) on a small Thinkpad (AMD Dual Core with 
>>>> 4gb ram)
>>>> and it took about 3,5 minutes to execute. Some interesting fact i noticed: 
>>>> As soon as i remove the dc:created property query gets about 10-100x faster
>>>> (TP: from 3,5mins > 30s / Quad core: 7 mins  > 5,5mins).
>>>> 
>>>> 
>>>> Is there anything left i could do to increase performance besides hosting 
>>>> it on a more powerful system?
>

Re: [Virtuoso-users] bif:contains - using a string variable as search term

Reply via email to