Re: using multiple text searches in a query

Andy Seaborne Tue, 21 Jan 2014 06:48:01 -0800

On 19/01/14 11:56, Pierre-Andre Michel wrote:


On Jan 18, 2014, at 4:29 PM, Andy Seaborne <[email protected]> wrote:

On 17/01/14 14:15, Pierre-Andre Michel wrote:


On Jan 16, 2014, at 7:53 PM, Andy Seaborne <[email protected]> wrote:

On 16/01/14 08:41, Pierre-Andre Michel wrote:

Hello Andy,

As promised I have run a test to see if text:query allows the use of multiple 
predicates as proposed below:


thanks for testing that.  That way you can OR as well as AND.

So if you write:

?a text:query(pred:cv-name 'ubl AND field2:ubiquitin' 10) .

where field2 is the name of text:field name then you may be a single, 
conjunctive query.


OK,  I will try what you suggest and tell you if it works or not.


and the answer is: Yes, it works, great !

So I can run efficiently queries with multiple criteria (fields / predicates) 
for a single subject variable.
Now If I want to text:query 2 subject variables ?a and ?b, for example:

?a text:query(pred:organ 'liver', 25) .
?b text:query(pred:author 'John Smith') .

the second query will still be called 25 times if 25 solutions are found for ?a 
during graph traveral.
Why don't we cache the result of the queries so that after the first call we 
dont invoke solr or lucene anymore but simply return an iterator on the result 
list previously built ?
Does it make sense to you ?


The optimizer does nothing here so that's what happens.  It needs a 
cross-product spotter to do that; it doesn't have one.
  The optimizer/evaluator has no concept of "text:query" being special so it 
blindly executes it.


OK, so is there a way to provide the optimizer with a cross-product spotter 
concerning test:query ?


Currently, you would need to add change the default optimizations applied, or 
provide your own, to add a new one.

There may be a way to write a query that stops the optimization applying at 
this point but it will a visible change to the query.

The specific optimization that is causing this is controlled by a switch 
"ARQ.optIndexJoinStrategy".  It's not specific to property functions like 
text:query and is one of the more important optimizations done. You could try running 
with it off but it may have consequences elsewhere in the overall query.

But aren't you going to connect ?a and ?b in some way?


Yes ?a and ?b would be connected some way but the problem remains.


Could you describe the use case here?  If I understand the situation better it 
will at least guide future work.  The general external index usage is more 
centred around one access per pattern.  Your case looks like it is a bit more 
complicated.


Hi Andy,

Thanks for your explanations about the switch ARQ.optIndexJoinStrategy.

Our use case is roughly the following: we have a dataset with Np Proteins described 
with Na Annotations supported by Ne Evidences based on Np Publications (Np=20'000, 
Na>5'000'000, Ne>10'000'000,Np=400'000).
A query involving 2 text:queries on distinct entities/variables could be like:

?isoform :has-function ?annotation .
?annotation text:query(pred:description 'glucose'  1000) .
?annotation :supported-by ?evidence .
?evidence :based-on ?publication .
?publication text:query(pred:abstract '(+crystallographic +SPR studies)' 10000) 
.

Hope it helps you to better understand our needs,
Cheers
Pierre-Andre

Thanks - certainly an interesting challenge for the optimizer. I addedthe general form to the list of possible Google Summer of Code projectsI could think of.


http://mail-archives.apache.org/mod_mbox/jena-dev/201401.mbox/%3C52DD3590.3060807%40apache.org%3E

        Andy


        Andy

Re: using multiple text searches in a query

Reply via email to