Thank you Andy for reply.
1. Performance: I was able to solve it by ordering the triples correctly. I
read a chapter in ‘Learning Sparql’ book on optimization. The problem in my
query was that I started with a large set, for example give me all things A,
then their Bs and use filter on B. Better option is give me all things B which
have filter then their As. After this tuning all queries now return in under 1
second, which is great.
2. I am trying to understand your feedback on Lucene index. Apology for not
giving actual code, but here is a better representation.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate no:name ]
[ text:field "text" ; text:predicate no:address ]
[ text:field "text" ; text:predicate no:bio ]
[ text:field "text" ; text:predicate no:qualification
]
[ text:field "text" ; text:predicate no:hobbies ]
) .
I want the ability to do a free text search over all properties name,
address, bio, qualification, hobbies in single query. Considering this is there
anything wrong with my configuration?
-Ajay
> On Nov 11, 2015, at 4:54 PM, Andy Seaborne <[email protected]> wrote:
>
> On 11/11/15 04:40, Kamble, Ajay, Crest wrote:
>> Thank you Andy for replying.
>>
>> 1. I have a mix of constrained and free text queries. My constrained queries
>> (or without free text/normal sparql queries) took 3-10 seconds. Free text
>> queries took around 1 second.
>> Do you mean that volume of Lucene index will affect constrained queries
>> as well?
>> At this point I had just included few concepts for Lucene index. Here is
>> my configuration:
>>
>> <#entMap> a text:EntityMap ;
>> text:entityField "uri" ;
>> text:defaultField "text" ;
>> text:map ( [ text:field "text" ; text:predicate no:concept1 ]
>
> concept1 is a class later one, not property.
>
> If this is an anonymized setup+query, it's not helping in answering the
> question.
>
>> [ text:field "text" ; text:predicate no:concept2 ]
>> [ text:field "text" ; text:predicate no:concept3 ]
>> [ text:field "text" ; text:predicate no:concept4 ]
>> [ text:field "text" ; text:predicate no:concept5 ]
>> [ text:field "text" ; text:predicate no:concept6 ] ) .
>
> That uses the same Lucene filed fro each predicate - I'm not sure what will
> happen. At best, it puts all the index text in one field so Lucene has to
> process all of them for any lookup.
>
>>
>> 2. Here is a sample query which takes 10+ seconds to execute. Is there
>> anything wrong with this query (or possibility of optimization)?
>
> The Lucene index and regex are unconnected.
> The Lucene index is accessed with a property function "text:query"
> http://jena.apache.org/documentation/query/text-query.html
>
>> PREFIX ex:<http://example.com/ns/concepts#>
>> PREFIX d:<http://example.com/ns/data#>
>>
>> SELECT DISTINCT ?a1
>
> DISTINCT can hide a lot of work being done to find many, but few unique,
> results.
>
>> WHERE {
>> ?n1 a ex:concept1 ;
>> ex:concept2 ?c1 ;
>
> concept as type and concept as property - looks odd to me.
>
>> ex:concept3 ?n2 ;
>> ex:concept4 ?f1 ;
>> ex:concept5 ?a1 .
>> ?c1 ex:concept6 ?cn1 .
>> ?f1 ex:concept7 ?fn1 .
>
> Depending on the overall shape of your data, this is huge. It does not start
> anywhere so it might well be a scan of a lot of the database.
>
> What's more multiple occurrences of properties on the same subject will lead
> to fan out causing duplication of ?a1, then hidden by the DISTINCT.
>
>> FILTER (regex(?n2, "^word1", "i"))
>> FILTER (regex(?cn1, "^word2$", "i"))
>> FILTER (regex(?fn1, "^word3$", "i")) }
>
> The way this query will execute is that the pattern part is executed,
> probably generating lot matches with a lot of duplication of ?a1, and the
> filters used to test the results. Filters are pushed to the best place but
> there is only so much they can do.
>
> Better might be:
> (after sorting out the reuse of one field in the lucene index)
>
> # Look for all ?n2 of interest by concept2 in Lucene:
> ?n2 text:query (ex:concept2 "word1") .
>
> # Then do pattern matching only for those ?n2
> ?n1 ex:concept3 ?n2 .
> ex:concept2 ?c1 ;
> ex:concept4 ?f1 ;
> ex:concept5 ?a1 .
> ?c1 ex:concept6 ?cn1 .
> ?f1 ex:concept7 ?fn1 .
> # Checks
> FILTER (regex(?cn1, "^word2$", "i"))
> FILTER (regex(?fn1, "^word3$", "i")) }
>
> You can start at word2 or word3 similarly - use the one with the last likely
> matches.
>
> You may need to keep the FILTERs if the way you get Lucene matches is more
> general than the regex version (e.g. stemming matters).
>
> Andy
>
>>
>> 3. About Hardware, right now I am just running this on my MacBook Pro with
>> 2.5 GHz Intel Core i7 and 16 GB of RAM.
>>
>> It would be great if you could give me some suggestions or point me to any
>> resource that explains Fuseki optimization.
>