On 1 September 2015 at 22:07, Sam Whited <s...@samwhited.com> wrote: > On Tue, Sep 1, 2015 at 2:35 PM, Dave Cridland <d...@cridland.net> wrote: > > So I think that as far as rosters go, this is duplicating XEP-0237 in a > > considerably less efficient form. > > The main thing to keep in mind is that it can be used to diff > arbitrary lists (rosters and MUC disco#items are specified, but you > could equally use it for caching entity caps or feature lists, or just > about any other arbitrary list your server felt like versioning). > > > XEP-0237§4 gives a fairly intense trip down implementation options, and > none > > of them require multiple versions of the roster as claimed by this > ProtoXEP. > > I have a personal preference for §4.3, though it gets complex with shared > > groups and so on. Nevertheless, it is possible to get perfect efficiency > if > > you're willing to store tombstones. > > In §4.2 Exact-match Conformance you must store multiple versions of > the roster (or at least the current version of the roster and any > pending pushes, maybe I should rephrase that statement in the XEP) > unless you want to re-sync the entire roster every time there's a > change and the user isn't online to receive a push. Eg. if the user > signs in and fetches the roster (with a digest version), then signs > out and a new user is added to his roster, then the user signs back in > and sends up the digest the server must have cached that new user to > send a roster push back down. If your new user is added to many > peoples rosters (but you can't guarantee that it's added to a whole > groups roster) you now have to store that roster push for every single > person who's roster it needs to be pushed to (as opposed to a single > version token in the users database table or somewhere that can be > diffed against). > > In §4.3 Add-only Conformance the assumption is that deletions are rare > (since this will trigger an entire roster invalidation). This is not > an assumption that can be made in many environments (eg. large > organizations where shared rosters may constantly have people being > deleted as people leave the company, contractors rotate in and out > etc.). The combined approach that's also described in this section is > somewhat better, but still requires that we store new additions in > many places (eg. once for every user that should get the push, or for > every group that shoud get the push, or both. This starts to > complicate the data model.) > > There are further workarounds for most of the issues I've just > described, but mostly they just lead to more rabbit holes and more > problems, and end up resulting in a very complicated solution. Entity > versioning just does this in a simpler way that works better with our > data model and distributed architecture (and potentially with others > architectures as well). We can also then re-use the exact same > semantics for other lists as previously discussed (instead of > maintaining two different syncrhonization and diffing mechanisms). > > I think most (or all) of the above only applies if you have rosters that are computed on demand, rather than managed by users via clients.
Otherwise all you need on a simple roster (no shared groups) is a counter for the version, the value of the latest tombstone *not* retained (ie, the last delete if there are no tombstones), and per item, the value of the last change, and if it's deleted (ie, if it's a tombstone). No multiple versions of anything. Tombstones are optional; but without them it means it's only efficient for adds. > There is actually a part 2 to this XEP which I hadn't submitted yet > (because we haven't implemented it yet and I didn't want to submit > until we at least had an implementation on our roadmap) where small > chunks of an entity list can be diffed (eg. so that you can say "give > me all changes to this subsection of the list") and then use a > "search" feature to get more list items later. This lets you receive a > subset of your roster (eg. if your roster has 10,000 users, you can > receive 1000 users that your server thinks you need at first, and then > use the search endpoints eg. if you go to start a chat and want to > list more users later via an "auto complete" mechanism). This would > make it so that you can slowly ramp up to full roster consistency > (note that I say roster a lot, but again, this is for any list). Maybe > I should go ahead and start working on that and submit it, because > with this second phase the benefits become more aparent. > > I agree. > > > Rosters are a particularly simple case for synchronization, because > there is > > a single view; disco#items has potentially one view per user, and as > such is > > more complex. > > > > In particular, assuming a room has configuration A, and then changes to > > configuration A' - while we can tell if A' is visible to a user U -- > let's > > call this V(A',U) -- we cannot tell if V(A,U) == V(A',U) without having > A; > > and given we don't always know which older configuration needs to be > stored > > to make that comparison, things can get complex fast. > > > > As such, a '237 style approach would probably be limited in practise to > > having a §4.2 approach of hashing the entire list. > > > > This ProtoXEP tackles this problem by having the client upload its view > for > > comparison, although it also includes an exact-match mechanism. > > > > However, it's not clear from the specification how a server can signal > > removal (or lack of visibility) > > That was an oversight on my part; I appear to have dropped our > mechanism for that somehow when compiling this from our internals > system into an XEP. I'll update soon. Thanks. > > > nor what advantages a client has in > > exchanging the download of a large amount of data with the upload of a > large > > amount of data. > > In addition to the issues I mentioned before, the upload (in our case) > is considerably less than the download because we send a lot of > metadata with rooms and rosters (via a custom namespaced metadata > element on the disco item or roster element). Eg. for a muc metadata > might include id, topic, acl info, owner, number of paricpiants, guest > url for unauthenticated web access, last active time, etc. Leaving > this meatadata off is not an option, because we'll just have to query > for it to display it anyways and we don't want to make another round > trip to do so. The combination of this with aggregate token checking > ensures that we don't have to upload anything if nothing has changed, > and don't have to download much if only a few things have changed > (rarely do we actually trigger an entire roster download, and the > uploads don't send any metadata, so they're still relatively small). > > So you're saying you added a bunch of stuff for efficiency, and then had to add an efficient synch mechanism due to the inefficiency it caused? ;-) Amazingly, the simple mechanism I detailed above still works for items containing metadata, incidentally. As I said before, the difficulty is in dealing with multiple views; I think MUC room listing has those, and I don't have a solution - at least, not without a changelog. In the meantime, I'll reserve judgement until I've seen a bit more on this. > —Sam > > > > -- > Sam Whited > pub 4096R/54083AE104EA7AD3 > https://blog.samwhited.com >