Thanks for the prompt response.

Can you tell me more about the “top k sorts”?  This sounds a little like what I 
am after.  Let me try to rephrase the use case and you can tell me if it helps.

As a user, I want to view the event log from most recent to least recent.  I 
want to see 50 events at a time.  I want to use “previous” and “next” to page 
by 50 items.

For us that means running the exact same query every time, changing only the 
offset.  Since this query needs to order the results before the limit/offset is 
applied, this query effectively loads into memory every event from beginning of 
time to now, does it joins, does the sort, and then slices out the new 50.

The average user will never page more than 3 or 4 pages deep before they apply 
other filters to constrain the result set.  But this is the default behavior 
they are used to (start without constraints at most recent) and I can’t figure 
out how to do this more efficiently and it *really* seems like this has to be a 
common issue.

FYI:  I’ll be away from a computer for a few weeks so if you respond and I 
don’t hear back, don’t assume I dropped the ball!

Adam

From: Andy Seaborne <[email protected]>
Date: Thursday, May 13, 2021 at 1:32 PM
To: [email protected] <[email protected]>
Subject: Re: Ordering results from oldest to newest
CAUTION: This email originated from outside of Thermo Fisher Scientific. If you 
believe it to be suspicious, report using the Report Phish button in Outlook or 
send to [email protected].


On 11/05/2021 16:54, Kimball, Adam wrote:
> I know that I’ve asked this question before, but I am still struggling to 
> understand how I might handle this case:
>
> I have a Jena DB of event entries.  One common way to view the events is to 
> page through them.  Normally this is done by seeing the most recent 50 events 
> and then paging to the next 50 most recent and so on.
>
> In pure SPARQL, I don’t really see an efficient way to accomplish this.  With 
> limit and offset, I don’t really save anything other than i/o since the whole 
> result set will need to be ordered before this limit/offset has an effect.  
> And that is killing us now.
>
> My guess is we will need to implement some caching or possibly index the 
> graph with Lucene or something.  It is doable but definitely not ideal.  
> Maybe I can use the quad position to facilitate this?  I am assuming this 
> cannot be optimized within Jena itself?
>
> Best,
> Adam
>
>

Hi Adam,

No - there isn't a better way in std SPARQL. If you think the app is
going to process all the results, reading the whole thing into some
local cache is a way to go.

The proper solution is a overhaul of the SPARQL protocol.

Also, HTTP/2 may offer some iteresting possibilities.

Specific to ARQ: query execution is often predictable and stable order.
There aren't many places where - absent concurrent updates - the order
will be different from call to call.

FWIW Jena does optimize "top k" sorts  SELECT-sort-LIMIT/OFFSET up to
(from memory) k=1000 items.

 > Maybe I can use the quad position to facilitate this?

Not sure what the idea is here.

     Andy

Reply via email to