Yes - that's a problem.

An approach that could be done server side is analysis the queries and when it sees "SELECT * ...." (or maybe with an addition flag to paging mode) and have special cache for whole results then slice with LIMIT-OFFSET.

This could work but looks fragile as to how the cache items get released. An explicit protocol with an "I'm finished" indication will be more stable.

In Fuseki, have a different query service with a non-standard protocol. (HTTP/2 has some interesting abilities as well for ahead-of-time responses). Ask query, get reference to results, then pull the results slice by slice. SPARQL isn't hard-wired into Fuseki, there can be additional services (e.g the SHACL validation service) on a dataset.

Another throught - given the 3/4 pages average use, it could be done client-side - one query, 10 pages worth at once. Re-issue query every 10 pages if the unlikely happens and the use does go over 4 pages.

What is the client software in-use?

    Andy

On 13/05/2021 22:04, Kimball, Adam wrote:
Thanks for the prompt response.

Can you tell me more about the “top k sorts”?  This sounds a little like what I 
am after.  Let me try to rephrase the use case and you can tell me if it helps.

As a user, I want to view the event log from most recent to least recent.  I 
want to see 50 events at a time.  I want to use “previous” and “next” to page 
by 50 items.

For us that means running the exact same query every time, changing only the 
offset.  Since this query needs to order the results before the limit/offset is 
applied, this query effectively loads into memory every event from beginning of 
time to now, does it joins, does the sort, and then slices out the new 50.

The average user will never page more than 3 or 4 pages deep before they apply 
other filters to constrain the result set.  But this is the default behavior 
they are used to (start without constraints at most recent) and I can’t figure 
out how to do this more efficiently and it *really* seems like this has to be a 
common issue.

FYI:  I’ll be away from a computer for a few weeks so if you respond and I 
don’t hear back, don’t assume I dropped the ball!

Adam

From: Andy Seaborne <a...@apache.org>
Date: Thursday, May 13, 2021 at 1:32 PM
To: users@jena.apache.org <users@jena.apache.org>
Subject: Re: Ordering results from oldest to newest
CAUTION: This email originated from outside of Thermo Fisher Scientific. If you 
believe it to be suspicious, report using the Report Phish button in Outlook or 
send to s...@thermofisher.com.


On 11/05/2021 16:54, Kimball, Adam wrote:
I know that I’ve asked this question before, but I am still struggling to 
understand how I might handle this case:

I have a Jena DB of event entries.  One common way to view the events is to 
page through them.  Normally this is done by seeing the most recent 50 events 
and then paging to the next 50 most recent and so on.

In pure SPARQL, I don’t really see an efficient way to accomplish this.  With 
limit and offset, I don’t really save anything other than i/o since the whole 
result set will need to be ordered before this limit/offset has an effect.  And 
that is killing us now.

My guess is we will need to implement some caching or possibly index the graph 
with Lucene or something.  It is doable but definitely not ideal.  Maybe I can 
use the quad position to facilitate this?  I am assuming this cannot be 
optimized within Jena itself?

Best,
Adam



Hi Adam,

No - there isn't a better way in std SPARQL. If you think the app is
going to process all the results, reading the whole thing into some
local cache is a way to go.

The proper solution is a overhaul of the SPARQL protocol.

Also, HTTP/2 may offer some iteresting possibilities.

Specific to ARQ: query execution is often predictable and stable order.
There aren't many places where - absent concurrent updates - the order
will be different from call to call.

FWIW Jena does optimize "top k" sorts  SELECT-sort-LIMIT/OFFSET up to
(from memory) k=1000 items.

  > Maybe I can use the quad position to facilitate this?

Not sure what the idea is here.

      Andy


Reply via email to