Hello Marvin, > Why does bif:contains return faster than REGEX search,
bif:contains return faster because it uses special full-text index to get IDs of objects that contain words mentioned in the query, it do not scan the whole table like Regex-based query. The advantage of REGEX is flexibility: one may search for specific fragments of words or for special data like protein coding sequences. Moreover, bif:contains may be used only for variables that are directly bound in object position of triple, not for values of expressions of any other sorts. > and why are they returning a different number of counted rows? Because bif:contains looks for phrases or independent words, and it may normalize words that use non-canonical Unicode chars, and it can search in XML/HTML documents. In addition, even if one and the same query string is valid for both REGEX and bif:contains then the meaning may differ. For REGEX, pattern "Paris Hilton" is precisely two words delimited by single whitespace byte. For bif:contains, "Paris Hilton" means that the document should contain word "Paris" and word "Hilton", in any places and in any order. See http://docs.openlinksw.com/virtuoso/queryingftcols.html for details of bif:contains query string syntax. > The search string is not the real string but it does not change the question. Which should we use? I'd strongly recommend to report real details as soon as the question is about real problem on a real system -- this may result in really useful answers. > *Question 2* > Why can't I search a property or subject? > > SQL> sparql select count(*) where {?s ?p ?o. ?p bif:contains > "searchstring"}; > > *** Error 37000: [Virtuoso Driver][Virtuoso Server]SQ074: Line 1: SP031: > SPARQL compiler: The group does not contain triple pattern with '$p' object > before bif:contains() predicate > at line 1 of Top-Level: > sparql select count(*) where {?s ?p ?o. ?p bif:contains "searchstring"} bif:contains uses the free-text index on table of distinct objects. Subjects and predicates are not objects, moreover, they are not texts at all. The query has failed because there's no triple pattern with ?p in object (i.e. third) position in a triple. Best Regards, Ivan Mikhailov, OpenLink Software http://virtuoso.openlinksw.com