Hello Andy, I did not completely understand your feedback.
Is there any advantage if I use different fields for different predicates, given my intention is to find matches on any predicate. So, if I search for “foo”, I want matches from bio, qualification and everything in single query. Here is my sample query, PREFIX text: <http://jena.apache.org/text#> PREFIX no: <http://nano.springer.com/ns/nanoobjects#> PREFIX d: <http://nano.springer.com/ns/data#> SELECT ?s ?score ?what { (?s ?score) text:query 'gold nanoparticles' . ?s a ?what . } -Ajay On Nov 12, 2015, at 12:26 AM, Andy Seaborne <[email protected]<mailto:[email protected]>> wrote: On 11/11/15 16:30, Kamble, Ajay, Crest wrote: Thank you Andy for reply. 1. Performance: I was able to solve it by ordering the triples correctly. I read a chapter in ‘Learning Sparql’ book on optimization. The problem in my query was that I started with a large set, for example give me all things A, then their Bs and use filter on B. Better option is give me all things B which have filter then their As. After this tuning all queries now return in under 1 second, which is great. 2. I am trying to understand your feedback on Lucene index. Apology for not giving actual code, but here is a better representation. <#entMap> a text:EntityMap ; text:entityField "uri" ; text:defaultField "text" ; text:map ( [ text:field "text" ; text:predicate no:name ] [ text:field "text" ; text:predicate no:address ] [ text:field "text" ; text:predicate no:bio ] [ text:field "text" ; text:predicate no:qualification ] [ text:field "text" ; text:predicate no:hobbies ] ) . I want the ability to do a free text search over all properties name, address, bio, qualification, hobbies in single query. Considering this is there anything wrong with my configuration? have you considered having different fields for different predicates? <#entMap> a text:EntityMap ; text:entityField "uri" ; text:defaultField "name" ; text:map ( [ text:field "name" ; text:predicate no:name ] [ text:field "address" ; text:predicate no:address ] [ text:field "bio" ; text:predicate no:bio ] [ text:field "qualification" ; text:predicate no:qualification ] [ text:field "hobbies" ; text:predicate no:hobbies ] ) . then you can search by predicate. ?uri text:query (no:address 'Road') . as you have it, searching by "foo" returns multiple matches if "foo" is in, say bio and qualification. Andy -Ajay On Nov 11, 2015, at 4:54 PM, Andy Seaborne <[email protected]<mailto:[email protected]>> wrote: On 11/11/15 04:40, Kamble, Ajay, Crest wrote: Thank you Andy for replying. 1. I have a mix of constrained and free text queries. My constrained queries (or without free text/normal sparql queries) took 3-10 seconds. Free text queries took around 1 second. Do you mean that volume of Lucene index will affect constrained queries as well? At this point I had just included few concepts for Lucene index. Here is my configuration: <#entMap> a text:EntityMap ; text:entityField "uri" ; text:defaultField "text" ; text:map ( [ text:field "text" ; text:predicate no:concept1 ] concept1 is a class later one, not property. If this is an anonymized setup+query, it's not helping in answering the question. [ text:field "text" ; text:predicate no:concept2 ] [ text:field "text" ; text:predicate no:concept3 ] [ text:field "text" ; text:predicate no:concept4 ] [ text:field "text" ; text:predicate no:concept5 ] [ text:field "text" ; text:predicate no:concept6 ] ) . That uses the same Lucene filed fro each predicate - I'm not sure what will happen. At best, it puts all the index text in one field so Lucene has to process all of them for any lookup. 2. Here is a sample query which takes 10+ seconds to execute. Is there anything wrong with this query (or possibility of optimization)? The Lucene index and regex are unconnected. The Lucene index is accessed with a property function "text:query" http://jena.apache.org/documentation/query/text-query.html PREFIX ex:<http://example.com/ns/concepts#> PREFIX d:<http://example.com/ns/data#> SELECT DISTINCT ?a1 DISTINCT can hide a lot of work being done to find many, but few unique, results. WHERE { ?n1 a ex:concept1 ; ex:concept2 ?c1 ; concept as type and concept as property - looks odd to me. ex:concept3 ?n2 ; ex:concept4 ?f1 ; ex:concept5 ?a1 . ?c1 ex:concept6 ?cn1 . ?f1 ex:concept7 ?fn1 . Depending on the overall shape of your data, this is huge. It does not start anywhere so it might well be a scan of a lot of the database. What's more multiple occurrences of properties on the same subject will lead to fan out causing duplication of ?a1, then hidden by the DISTINCT. FILTER (regex(?n2, "^word1", "i")) FILTER (regex(?cn1, "^word2$", "i")) FILTER (regex(?fn1, "^word3$", "i")) } The way this query will execute is that the pattern part is executed, probably generating lot matches with a lot of duplication of ?a1, and the filters used to test the results. Filters are pushed to the best place but there is only so much they can do. Better might be: (after sorting out the reuse of one field in the lucene index) # Look for all ?n2 of interest by concept2 in Lucene: ?n2 text:query (ex:concept2 "word1") . # Then do pattern matching only for those ?n2 ?n1 ex:concept3 ?n2 . ex:concept2 ?c1 ; ex:concept4 ?f1 ; ex:concept5 ?a1 . ?c1 ex:concept6 ?cn1 . ?f1 ex:concept7 ?fn1 . # Checks FILTER (regex(?cn1, "^word2$", "i")) FILTER (regex(?fn1, "^word3$", "i")) } You can start at word2 or word3 similarly - use the one with the last likely matches. You may need to keep the FILTERs if the way you get Lucene matches is more general than the regex version (e.g. stemming matters). Andy 3. About Hardware, right now I am just running this on my MacBook Pro with 2.5 GHz Intel Core i7 and 16 GB of RAM. It would be great if you could give me some suggestions or point me to any resource that explains Fuseki optimization.
