Hi Andy!

Andy Seaborne kirjoitti 2.12.2020 klo 13.01:
There is no reason I can see why the special case of exactly one "FROM" can't be handled specially. It masks all named graphs but is a rewrite from triples, that will be fine.
Right. Is this worth opening a JIRA issue?
If you want. It doesn't make it happen though; taht takes coding.

Understood. I won't open an issue right now but maybe later. But I doubt I'll be able to implement the change, at least not without some hints and perhaps substantial help.

I replaced the SELECT line with a COUNT to see where the time is going.

OK.

Now there is a clear pattern: starting from Jena 3.10.0, the first timed run is much slower. So something happened there that makes the first full query (after the partial warmup) take much longer than it used to - but apparently not the subsequent ones which I timed yesterday.

Any ideas what this could be about? Should this be investigated more?
> It would be great if you would.

Now this turned into a rather interesting exercise in using git bisect. I was able to track down the change that caused the slowdown. It's this merge commit:

[f93fdbad7aa8d6ddb46693395e3bfb5ea487bf16] JENA-1648: Merge commit 'refs/pull/507/head' of https://github.com/apache/jena

which refers to this pull request:

https://github.com/apache/jena/pull/507

I don't have time for very deep analysis right now but it doesn't surprise me that a substantial change to the query result serialization slows down the queries.

Things to check: (mostly as a TODO list for myself)

1. Does this depend on the query result format? For example, is only the text format (default) slower than before? 2. Is there something suspicious in the PR 507 code that would explain why it's so much slower?

I have no idea right now how significant this performance regression is. Judging from the fact that nobody has complained about it in almost 2 years, it doesn't seem to be very bad.

I have a change lined up to fix "--out none" to be "less optimized"; it currently does nothing (very efficiently), it could consume silently the query results.

It would also be better if the warmup with writing the required format to /dev/null would also be better.

I see that you already did this - great work!

-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

Reply via email to