On 19/01/14 11:56, Pierre-Andre Michel wrote:
On Jan 18, 2014, at 4:29 PM, Andy Seaborne <[email protected]> wrote:
On 17/01/14 14:15, Pierre-Andre Michel wrote:
On Jan 16, 2014, at 7:53 PM, Andy Seaborne <[email protected]> wrote:
On 16/01/14 08:41, Pierre-Andre Michel wrote:
Hello Andy,
As promised I have run a test to see if text:query allows the use of multiple
predicates as proposed below:
thanks for testing that. That way you can OR as well as AND.
So if you write:
?a text:query(pred:cv-name 'ubl AND field2:ubiquitin' 10) .
where field2 is the name of text:field name then you may be a single,
conjunctive query.
OK, I will try what you suggest and tell you if it works or not.
and the answer is: Yes, it works, great !
So I can run efficiently queries with multiple criteria (fields / predicates)
for a single subject variable.
Now If I want to text:query 2 subject variables ?a and ?b, for example:
?a text:query(pred:organ 'liver', 25) .
?b text:query(pred:author 'John Smith') .
the second query will still be called 25 times if 25 solutions are found for ?a
during graph traveral.
Why don't we cache the result of the queries so that after the first call we
dont invoke solr or lucene anymore but simply return an iterator on the result
list previously built ?
Does it make sense to you ?
The optimizer does nothing here so that's what happens. It needs a
cross-product spotter to do that; it doesn't have one.
The optimizer/evaluator has no concept of "text:query" being special so it
blindly executes it.
OK, so is there a way to provide the optimizer with a cross-product spotter
concerning test:query ?
Currently, you would need to add change the default optimizations applied, or
provide your own, to add a new one.
There may be a way to write a query that stops the optimization applying at
this point but it will a visible change to the query.
The specific optimization that is causing this is controlled by a switch
"ARQ.optIndexJoinStrategy". It's not specific to property functions like
text:query and is one of the more important optimizations done. You could try running
with it off but it may have consequences elsewhere in the overall query.
But aren't you going to connect ?a and ?b in some way?
Yes ?a and ?b would be connected some way but the problem remains.
Could you describe the use case here? If I understand the situation better it
will at least guide future work. The general external index usage is more
centred around one access per pattern. Your case looks like it is a bit more
complicated.
Hi Andy,
Thanks for your explanations about the switch ARQ.optIndexJoinStrategy.
Our use case is roughly the following: we have a dataset with Np Proteins described
with Na Annotations supported by Ne Evidences based on Np Publications (Np=20'000,
Na>5'000'000, Ne>10'000'000,Np=400'000).
A query involving 2 text:queries on distinct entities/variables could be like:
?isoform :has-function ?annotation .
?annotation text:query(pred:description 'glucose' 1000) .
?annotation :supported-by ?evidence .
?evidence :based-on ?publication .
?publication text:query(pred:abstract '(+crystallographic +SPR studies)' 10000)
.
Hope it helps you to better understand our needs,
Cheers
Pierre-Andre
Thanks - certainly an interesting challenge for the optimizer. I added
the general form to the list of possible Google Summer of Code projects
I could think of.
http://mail-archives.apache.org/mod_mbox/jena-dev/201401.mbox/%3C52DD3590.3060807%40apache.org%3E
Andy
Andy