Hi Laura! Laura Morales kirjoitti 23.05.2017 klo 10:23:
Thank you for the answer. So let's say I want to search nodes in my graph by rdfs:label. Is this correct... 1) STRSTART(): fast by default because predicates are sorted. Only does exact search. 2) STRSTART(LCASE(?label)): fast because predicates are sorted, but just a little bit slower than 1) because if muse LCASE() some strings 3) REGEX(): slow because it must go through all rdfs:labels (use jena-text instead) 4) CONTAINS(): slow because it must go through all rdfs:labels (use jena-text instead) Is this correct?
I believe all of these are roughly equivalent in terms of performance. All of them need to scan all the rdfs:label values. Obviously REGEX is a bit more expensive than e.g. STRSTARTS but the difference is not very big. I don't think there's any sorting of predicate values in TDB that would help here.
If my app has an input search box where users can search an item by title (on a large graph), would it be a good idea to go with 2) or should I consider setting up a text-query index?
I recommend setting up a text index if you want to do partial matching of labels from a large graph.
-Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi