On 28/02/13 17:22, Stephen Allen wrote:
The results you are seeing indicate that this is probably 4store
executing the query slowly, and not anything to do with the Jena
client. You could even take Jena out of the mix and test getting the
results directly from the endpoint:
time curl --data-binary "@query1.txt" -H "Content-Type:
application/sparql-query" "http://localhost:3030/ds/query" >>
/dev/null
Unfortunately, databases are notorious for handling IN clauses poorly
(even many SQL databases). If 4store supports all of SPARQL 1.1, then
you can try changing the IN clause to a VALUES clause [1] and see if
that helps.
-Stephen
[1] http://www.w3.org/TR/sparql11-query/#inline-data
or even writing
FILTER(?x = <uri1> || ?x = <uri2> || ... )
which is logically the same but might (just might) trigger the optimizer
to so something.
But I'm guessing that Stephen's suggestion shows it's how 4Store executes.
On Thu, Feb 28, 2013 at 10:30 AM, Burak Yönyül <[email protected]> wrote:
Hi,
When I reduce FILTER block, the execution time of the query longs shorter,
but I receive less result than original query. So result set is reducing
too.
Sounds like it's probing to see if the variable has one of the values.
I recorded each elapsed time round the while loop, and there is variability
at some looping times. The code that records times:
int i = 0;
long before = System.currentTimeMillis();
while (resultSet.hasNext()) {
i++;
resultSet.next();
long after = System.currentTimeMillis();
fileWriter.append("Time of " + i + ". result: " + (after - before)+" ms"
+ "\n");
before = System.currentTimeMillis();
}
The example output:
Time of 1. result: 4 ms
Time of 2. result: 0 ms
Time of 3. result: 1 ms
...
Time of 20. result: 14 ms
Time of 21. result: 0 ms
Time of 22. result: 1 ms
Time of 23. result: 1 ms
...
Time of 27. result: 17 ms
Time of 28. result: 1 ms
...
Time of 34. result: 10 ms
... and so on.
So the server is sending rows back burstily - that is not Java CG at
10-20 rows or the cost of sending the query. It's 4Store.
But when I execute LIMIT query, these times are all 0 or 1.
I don't know that, why in FILTER query, there is time differences at
getting some results. Do you have any idea about that?
It really does look like the cost of the FILTER having to get the
lexical form of the URI to do the comparison on a high number of items.
Maybe also probing to see if it is a value, not getting all the
choices once per query and testing.
(ARQ+TDB can go mad on these as well - it's a tricky thing to optimize
in all situations.)
Andy
Best,
Burak Yönyül