Hello Marvin,

> Why does bif:contains return faster than REGEX search,

bif:contains return faster because it uses special full-text index to
get IDs of objects that contain words mentioned in the query, it do not
scan the whole table like Regex-based query. The advantage of REGEX is
flexibility: one may search for specific fragments of words or for
special data like protein coding sequences. Moreover, bif:contains may
be used only for variables that are directly bound in object position of
triple, not for values of expressions of any other sorts.

> and why are they returning a different number of counted rows?

Because bif:contains looks for phrases or independent words, and it may
normalize words that use non-canonical Unicode chars, and it can search
in XML/HTML documents. In addition, even if one and the same query
string is valid for both REGEX and bif:contains then the meaning may
differ. For REGEX, pattern "Paris Hilton" is precisely two words
delimited by single whitespace byte. For bif:contains, "Paris Hilton"
means that the document should contain word "Paris" and word "Hilton",
in any places and in any order. See
http://docs.openlinksw.com/virtuoso/queryingftcols.html
for details of bif:contains query string syntax.

> The search string is not the real string but it does not change the
question. Which should we use?

I'd strongly recommend to report real details as soon as the question is
about real problem on a real system -- this may result in really useful
answers.

> *Question 2*
> Why can't I search a property or subject?
> 
> SQL>  sparql select count(*) where {?s ?p ?o. ?p bif:contains 
> "searchstring"};   
> 
> *** Error 37000: [Virtuoso Driver][Virtuoso Server]SQ074: Line 1: SP031: 
> SPARQL compiler: The group does not contain triple pattern with '$p' object 
> before bif:contains() predicate
> at line 1 of Top-Level:
>  sparql select count(*) where {?s ?p ?o. ?p bif:contains "searchstring"}

bif:contains uses the free-text index on table of distinct objects.
Subjects and predicates are not objects, moreover, they are not texts at
all. The query has failed because there's no triple pattern with ?p in
object (i.e. third) position in a triple.

Best Regards,

Ivan Mikhailov,
OpenLink Software
http://virtuoso.openlinksw.com



Reply via email to