Re: [Standards] Proposed XMPP Extension: Order-By

Jonas Schäfer Sat, 12 Jan 2019 11:52:12 -0800

On Montag, 7. Januar 2019 19:33:21 CET Goffi wrote:
> Hi Jonas,
> 
> Le lundi 7 janvier 2019, 18:08:25 CET Jonas Schäfer a écrit :
> > On Sonntag, 6. Januar 2019 15:16:43 CET Jonas Schäfer wrote:
> > > Title: Order-By
> > > Abstract:
> > > This specification allows to change order of items retrieval in a
> > > Pubsub or MAM query
> > 
> > Couple notes:
> > 
> > - The strings for the "modification" and "creation" fields (as used in the
> > <order/> element) should be URNs, I think, to allow future extensions
> > without having to worry about conflicts.
> 
> I have thought about that too, but I took XEP-0313 as an example where the
> common filters are using simple keywords ("start", "end", and "with" see
> XEP-0313 § 4.1). So I think this is alright to use keyword for the 2 most
> common cases, and URN for extensions, in the same way as it is done in MAM.


Fair.

> > - Reversal via RSM seems wrong. You also can’t solve reversal via RSM when
> > you use multiple levels of <order/> elements.
> 
> Lets say I have items 1, 2, 3, 4, 5, 6 and I want to get them by creation
> order, 6 being the most recent, with a max of 2 per page, I would get with
> normal request: - 6, 5 (page 1)
> - 4, 3 (page 2)
> - 2, 1 (page 3)
> 
> But if I do backward using RSM's <before/> element, it would be
> 
> - 2, 1 (page 3)
> - 4, 3 (page 2)
> - 6, 5 (page 1)
> 
> (remember, RSM works by pages, so we just get the pages reversed).
> In practice, the order if always DESC. But if the client want's to have ASC,
> it just go backward using RSM, and reverse the result of each page (which
> is trivial), so it would get 1, 2, 3, 4, 5, 6.
> 
> In both cases, requesting `<before>4</before>` gives `6, 5` (i.e. page 1)
> and requesting `<after>3</after>` gives `2, 1` (i.e. page 3).
> 
> Multiple levels of ordering doesn't change anything to the deal, once
> everything is ordered, we end up with a simple list of items.

They do, here’s how: You have two metadata fields, let’s call them Time and 
Author. You have the following data:

Time   Author   Item ID
10:00  Alice    1
11:00  Alice    2
12:00  Bob      3
13:00  Bob      4
14:00  Bob      5
14:00  Carol    6
15:00  Carol    7

Suppose I want the posts ordered Ascendingly by Author, and use the time as 
tiebreaker, descendingly. The resulting set would be:

2 1 5 4 3 7 6

There is no way to achieve this ordering with RSM alone.

> > I think this specification needs to be very clear how this stacks with RSM
> > (XEP-0059) and the underlying result set. In my mind, it stacks like this:
> > 
> > PubSub "Database" -> PubSub-level Filters -> order-by XEP -> RSM
> > 
> > I.e. it modifies the base Result Set in RSM terminology.
> 
> Yes RSM is not detailed enough, it's a really important thing to describe,
> I'll work on it on next update. Do you think my example above is clear
> enough?

An example would definitely help.

> > I think this XEP should also provide guidance how to integrate this
> > properly with RSM on the server side: It is not clear to me how a service
> > could sensibly pick <before/> and <after/> item IDs in a way which allows
> > it to reserve the guarantees of delivering a result set which does not
> > lack items which have existed for the entire time the query was being
> > processed (I call this property "completeness"), as well as a
> > duplicate-free result set. If this is not possible (and I believe that to
> > be true), it should be spelt out clearly in the XEP.
> 
> Actually the problematic case is for item overwritting, i.e. if we order by
> "date of modification" according to the protoXEP terms, and this default
> case with XEP-0060 (there is nothing like "modification" or "update", but
> the result is the same: if you overwrite an item, it jump on the top). If
> you use the new order introduced by the protoXEP (date of creation), the
> items won't move, they can only be deleted.

I don’t think that relates to what I was trying to say. RSM relies strongly on 
having a unique, immutable identifier for items (which is used in <before/> 
and <after/>). On the *implementation* side (mind, I’m asking specifically for 
implementation guidance), I am not clear on what would be used here. Sure, you 
can use the Item ID, but there’s the problem that you cannot map that to an 
SQL query with ORDER BY. You can use the classic "nth item in result set", but 
that loses the completeness and dub-free guarantees. You can always retrieve 
the full result set from your database and do the RSM paging on the server 
application, but that’s obviously slow.

> > (Note: the guarantees ("completeness" and duplicate-freeness) I am talking
> > about are a "MAY" in RSM, §2.2, bullet points 2 and 3.)
> 
> yes I was about to mention it is a "MAY" too :). But I totally agree with
> your points, I'll explain it clearly in next revision.

> > I am not completely sure whether it wouldn’t make more sense to specify an
> > extension which allows to sort by arbitrary (or specific) Atom fields and
> > let the Atom feed handle the management of modification/creation values.
> > This also allows to access Atom feeds which are only mapped/gateway’d
> > into XMPP and where the (authoritative) creation/modification times in
> > the Atom feed then do not match the times as perceived and thus used by
> > the PubSub service.
> We already have <published> and <updated> fields in Atom, but they are
> specified by client, so this lead to majors issues: - clock may not (and
> will not) be synchronised
> - date can be faked

The lack of clock synchronisation is a problem in any case, even when the 
timestamps are generated on the server side, since XMPP is federated. I’d even 
argue that it is *worse* when the server is solely responsible, because there 
is no way for a client to fix it.

I don’t see the faking of dates as an issue. At least for the blogging case. 
Quite the contrary, I think it’s a good thing to have the flexibility (and in 
a federated network, an adversary who wins by faking the dates can always 
create their own server and spoof the dates there; so there’s no way for 
clients to rely on those dates anyways).

On the other hand, I think that *not* letting the server do the date thing and 
instead selecting on the data already present in, for example, Atom, gives us 
the advantage of being able to map existing Atom data into XMPP without loss 
of usability and generality.

> Also this would only be useful with blogging, while this feature is needed
> in many others cases (like a pubsub tickets handler that I'm currently
> using).

Point taken. Although there is no reason why a issue tracker couldn’t reuse 
the Atom namespace to store its dates.

> Filtering on an arbitrary field is a possible future extension that could be
> really useful. But let's do one thing at a time :).

Fair enough.

kind regards,
Jonas

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
_______________________________________________

Re: [Standards] Proposed XMPP Extension: Order-By

Reply via email to