On 07/23/2013 04:57 PM, Diogo FC Patrao wrote:
HelloI observed the same behaviour as you did and have some considerations that are product of that. I haven't checked the Jena sources, so I may be wrong here. As stated before in this list, ARQ doesn't have any federation-specific optimization, so it behaves as if the cost of accessing local and federated data were the same. Consider the SQL query below: SELECT * FROM A JOIN B ON (A.some_column = B.some_column) There's at least two plans for solving that: (1) one that makes the crossproduct between A and B, and then filters results according to the condition; and (2) that, for each element in A, looks for elements in B that satisfies the condition. To select the better plan, the SQL planner takes in consideration wheter the relevant columns are indexed and the size of tables involved - all factors impact on the cost of accessing rows. The better plan for the query you posted would be (1), simply because of the cost of accessing a remote service. But, if the first SERVICEd query would return just a few lines, maybe it would be better to run a couple of times the same query as in (2) than to get all results.
I agree. I started out with (2) because ARQ by default did that. However, soon after, that wasn't going to work out and so explored a way to do (1). Now doing (1) but I'm trying to get more out of it. I have to take a look closer at Rob Vasse's suggestion: ARQ.getContext().set(ARQ.optIndexJoinStrategy, false);
As for optimizing the query, I would try separating the each query into a UNION, one part with the OPTIONAL, the other without it. Getting the subproperties, depending on which triplestore you're querying, can be expensive too. If it's Fuseki+TDB and you have access to the server configuration, you could turn on RDFs inference. Also, the order of the triples can influence a lot on the overall query performance - put the triples that return lesser results before the others. Good luck!
I'm not sure I see how UNION can be used as per your suggestion such that the results contain values for each field. Only one of the variables in OPTIONAL is used towards the final output. Duplicating the earlier pattern plus what was in OPTIONAL is probably not ideal. Did I misunderstand you?
I'll test it with only RDFS inference. Based on my tests, the order of the statements are as good as they get. Thanks for the suggestions. -Sarven
smime.p7s
Description: S/MIME Cryptographic Signature