Re: Achieving reasonably performing federated queries

Sarven Capadisli Thu, 25 Jul 2013 05:50:24 -0700

On 07/23/2013 04:57 PM, Diogo FC Patrao wrote:

Hello


I observed the same behaviour as you did and have some considerations that
are product of that. I haven't checked the Jena sources, so I may be wrong
here.

As stated before in this list, ARQ doesn't have any federation-specific
optimization, so it behaves as if the cost of accessing local and federated
data were the same.

Consider the SQL query below:

SELECT * FROM A JOIN B ON (A.some_column = B.some_column)

There's at least two plans for solving that: (1) one that makes the
crossproduct between A and B, and then filters results according to the
condition; and (2) that, for each element in A, looks for elements in B
that satisfies the condition.  To select the better plan, the SQL planner
takes in consideration wheter the relevant columns are indexed and the size
of tables involved - all factors impact on the cost of accessing rows.

The better plan for the query you posted would be (1), simply because of
the cost of accessing a remote service. But, if the first SERVICEd query
would return just a few lines, maybe it would be better to run  a couple of
times the same query  as in (2) than to get all results.

I agree. I started out with (2) because ARQ by default did that. However, soon after, that wasn't going to work out and so explored a way to do (1). Now doing (1) but I'm trying to get more out of it. I have to take a look closer at Rob Vasse's suggestion: ARQ.getContext().set(ARQ.optIndexJoinStrategy, false);

As for optimizing the query, I would try separating the each query into a
UNION, one part with the OPTIONAL, the other without it. Getting the
subproperties, depending on which triplestore you're querying, can be
expensive too. If it's Fuseki+TDB and you have access to the server
configuration, you could turn on RDFs inference. Also, the order of the
triples can influence a lot on the overall query performance - put the
triples that return lesser results before the others.

Good luck!

I'm not sure I see how UNION can be used as per your suggestion such that the results contain values for each field. Only one of the variables in OPTIONAL is used towards the final output. Duplicating the earlier pattern plus what was in OPTIONAL is probably not ideal. Did I misunderstand you?


I'll test it with only RDFS inference.

Based on my tests, the order of the statements are as good as they get.

Thanks for the suggestions.

-Sarven

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Achieving reasonably performing federated queries

Reply via email to