On Tue, Sep 1, 2015 at 2:35 PM, Dave Cridland <d...@cridland.net> wrote: > So I think that as far as rosters go, this is duplicating XEP-0237 in a > considerably less efficient form.
The main thing to keep in mind is that it can be used to diff arbitrary lists (rosters and MUC disco#items are specified, but you could equally use it for caching entity caps or feature lists, or just about any other arbitrary list your server felt like versioning). > XEP-0237§4 gives a fairly intense trip down implementation options, and none > of them require multiple versions of the roster as claimed by this ProtoXEP. > I have a personal preference for §4.3, though it gets complex with shared > groups and so on. Nevertheless, it is possible to get perfect efficiency if > you're willing to store tombstones. In §4.2 Exact-match Conformance you must store multiple versions of the roster (or at least the current version of the roster and any pending pushes, maybe I should rephrase that statement in the XEP) unless you want to re-sync the entire roster every time there's a change and the user isn't online to receive a push. Eg. if the user signs in and fetches the roster (with a digest version), then signs out and a new user is added to his roster, then the user signs back in and sends up the digest the server must have cached that new user to send a roster push back down. If your new user is added to many peoples rosters (but you can't guarantee that it's added to a whole groups roster) you now have to store that roster push for every single person who's roster it needs to be pushed to (as opposed to a single version token in the users database table or somewhere that can be diffed against). In §4.3 Add-only Conformance the assumption is that deletions are rare (since this will trigger an entire roster invalidation). This is not an assumption that can be made in many environments (eg. large organizations where shared rosters may constantly have people being deleted as people leave the company, contractors rotate in and out etc.). The combined approach that's also described in this section is somewhat better, but still requires that we store new additions in many places (eg. once for every user that should get the push, or for every group that shoud get the push, or both. This starts to complicate the data model.) There are further workarounds for most of the issues I've just described, but mostly they just lead to more rabbit holes and more problems, and end up resulting in a very complicated solution. Entity versioning just does this in a simpler way that works better with our data model and distributed architecture (and potentially with others architectures as well). We can also then re-use the exact same semantics for other lists as previously discussed (instead of maintaining two different syncrhonization and diffing mechanisms). There is actually a part 2 to this XEP which I hadn't submitted yet (because we haven't implemented it yet and I didn't want to submit until we at least had an implementation on our roadmap) where small chunks of an entity list can be diffed (eg. so that you can say "give me all changes to this subsection of the list") and then use a "search" feature to get more list items later. This lets you receive a subset of your roster (eg. if your roster has 10,000 users, you can receive 1000 users that your server thinks you need at first, and then use the search endpoints eg. if you go to start a chat and want to list more users later via an "auto complete" mechanism). This would make it so that you can slowly ramp up to full roster consistency (note that I say roster a lot, but again, this is for any list). Maybe I should go ahead and start working on that and submit it, because with this second phase the benefits become more aparent. > Rosters are a particularly simple case for synchronization, because there is > a single view; disco#items has potentially one view per user, and as such is > more complex. > > In particular, assuming a room has configuration A, and then changes to > configuration A' - while we can tell if A' is visible to a user U -- let's > call this V(A',U) -- we cannot tell if V(A,U) == V(A',U) without having A; > and given we don't always know which older configuration needs to be stored > to make that comparison, things can get complex fast. > > As such, a '237 style approach would probably be limited in practise to > having a §4.2 approach of hashing the entire list. > > This ProtoXEP tackles this problem by having the client upload its view for > comparison, although it also includes an exact-match mechanism. > > However, it's not clear from the specification how a server can signal > removal (or lack of visibility) That was an oversight on my part; I appear to have dropped our mechanism for that somehow when compiling this from our internals system into an XEP. I'll update soon. Thanks. > nor what advantages a client has in > exchanging the download of a large amount of data with the upload of a large > amount of data. In addition to the issues I mentioned before, the upload (in our case) is considerably less than the download because we send a lot of metadata with rooms and rosters (via a custom namespaced metadata element on the disco item or roster element). Eg. for a muc metadata might include id, topic, acl info, owner, number of paricpiants, guest url for unauthenticated web access, last active time, etc. Leaving this meatadata off is not an option, because we'll just have to query for it to display it anyways and we don't want to make another round trip to do so. The combination of this with aggregate token checking ensures that we don't have to upload anything if nothing has changed, and don't have to download much if only a few things have changed (rarely do we actually trigger an entire roster download, and the uploads don't send any metadata, so they're still relatively small). —Sam -- Sam Whited pub 4096R/54083AE104EA7AD3 https://blog.samwhited.com