Hi Laura!

Laura Morales kirjoitti 23.05.2017 klo 10:23:

Thank you for the answer. So let's say I want to search nodes in my graph by 
rdfs:label. Is this correct...

1) STRSTART(): fast by default because predicates are sorted. Only does exact 
search.
2) STRSTART(LCASE(?label)): fast because predicates are sorted, but just a 
little bit slower than 1) because if muse LCASE() some strings
3) REGEX(): slow because it must go through all rdfs:labels (use jena-text 
instead)
4) CONTAINS(): slow because it must go through all rdfs:labels (use jena-text 
instead)

Is this correct?

I believe all of these are roughly equivalent in terms of performance. All of them need to scan all the rdfs:label values. Obviously REGEX is a bit more expensive than e.g. STRSTARTS but the difference is not very big. I don't think there's any sorting of predicate values in TDB that would help here.

If my app has an input search box where users can search an item by title (on a 
large graph), would it be a good idea to go with 2) or should I consider 
setting up a text-query index?

I recommend setting up a text index if you want to do partial matching of labels from a large graph.

-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

Reply via email to