Re: Performance regressions in Jena and TDB2

Osma Suominen Fri, 04 Dec 2020 08:43:11 -0800

Hi Andy!

Andy Seaborne kirjoitti 2.12.2020 klo 13.01:

There is no reason I can see why the special case of exactly one"FROM" can't be handled specially. It masks all named graphs but is arewrite from triples, that will be fine.
Right. Is this worth opening a JIRA issue?
If you want. It doesn't make it happen though; taht takes coding.

Understood. I won't open an issue right now but maybe later. But I doubtI'll be able to implement the change, at least not without some hintsand perhaps substantial help.

I replaced the SELECT line with a COUNT to see where the time is going.

OK.

Now there is a clear pattern: starting from Jena 3.10.0, the firsttimed run is much slower. So something happened there that makes thefirst full query (after the partial warmup) take much longer than itused to - but apparently not the subsequent ones which I timed yesterday.
Any ideas what this could be about? Should this be investigated more?
> It would be great if you would.

Now this turned into a rather interesting exercise in using git bisect.I was able to track down the change that caused the slowdown. It's thismerge commit:

[f93fdbad7aa8d6ddb46693395e3bfb5ea487bf16] JENA-1648: Merge commit'refs/pull/507/head' of https://github.com/apache/jena


which refers to this pull request:

https://github.com/apache/jena/pull/507

I don't have time for very deep analysis right now but it doesn'tsurprise me that a substantial change to the query result serializationslows down the queries.


Things to check: (mostly as a TODO list for myself)

1. Does this depend on the query result format? For example, is only thetext format (default) slower than before?2. Is there something suspicious in the PR 507 code that would explainwhy it's so much slower?

I have no idea right now how significant this performance regression is.Judging from the fact that nobody has complained about it in almost 2years, it doesn't seem to be very bad.

I have a change lined up to fix "--out none" to be "less optimized"; itcurrently does nothing (very efficiently), it could consume silently thequery results.
It would also be better if the warmup with writing the required formatto /dev/null would also be better.


I see that you already did this - great work!

-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

Re: Performance regressions in Jena and TDB2

Reply via email to