Re: [Standards] Presence distribution

Philipp Hancke Mon, 06 Apr 2009 09:54:21 -0700

Dave Cridland wrote:
[snip]

> sender doesn't want you to, which is substantially worse, and without
> any bad actor involved. (I could be wrong here, please check this.)


So we argue to "optimize" the protocol for this case? "Hmm, the other
side might get information I didn't want to tell him, well, lets send it
10 times instead of once. I guess that will 'fix' the problem."

I'm certainly arguing that there's insufficient reason to break theprotocol to save a few stanzas.


Ok. Let's wait for rogue s2s floodbots to bring down servers because
currently there is no effective s2s (or c2s) flood control.

Actually, the current model enables a single client to bring down
a server by making the server broadcast large presence stanzas at
a high rate. Simplistic bandwidth control mechanisms such as "karma"
account traffic per socket, and do not throttle the client when it
generates excessive amounts of s2s traffic.

This is still possible with view sharing, but the attack scales
with d instead of n.

Yes. But not at the cost of making the protocol worse. By the way, the


It is not "worse", just different.

Let's review what happens currently, assuming an initiating contact withn availableremote contacts across d domains, each of which has yresources connected - y being different in each instance obviously:


"available remote contacts" is those bis optimizations?
Actually, were those discussed before they were introduced?
What is their impact on presence reliability?

1) Client sends home server a broadcast presence. Cost E1, O(1).


Why not shift the responsibility for distribution to the client?
This would make E1=O(n) and E2=O(1) (simple message routing).
Afaics, your argument applies for that as well.

2) Home server, which (almost certainly) has client's roster in-memory,iterates through and emits one presence stanza per remote subscribedcontact known to be available. Cost E2, O(n). (Iterating through a listof known available contacts).


Actually, why don't you send the presence stanza to each connected
resource of each available contact here - iterating through a list of
known avaible contact resources)?

3) Home server encodes and transmits N stanzas, remote server decodesand transmits N stanzas. Cost E3, O(n). (One stanza per contact -arguably this is O(d) to open the stream, and O(y) to send the stanzas -you pick).


O(y) should be O(n) here?

4) Remote server receives one or more fully-addressed stanzas, andbroadcasts them to all resources. Cost E4, O(y). (Iterating through alist of resources of the recipient).


This is done n times. And you are neglecting costs for privacy list
checks (I will call that O(p), different for each instance and
quite costly imo).
Hence E4 = O(n) + n*(O(p) + O(y))

This gives E, as O(n) + O(y), a linear complexity algorithm.


O(n) + n*(O(p) + O(y)).

You want to replace 2-4 with:
2') Home server collates roster by remote domain and emits one presencestanza per remote domain which has a subscribed contacts known to beavailable. This is, I'll accept, likely to be close to the energy costof 2, although due to the fluctuating nature of the final clause thatcollation has to be done each time, so it'll be a little above. Cost E2'is, therefore > E2. O(n) + O(d) - I have a feeling this can be reducedto O(n).
3') encode/transmit/decode 1 stanza per remote domain. This is certainlyan energy/cpu/cost saving compared to the 3 above, no argument from mehere. Cost E3' is < E3. O(d) (One stanza per domain. Still linear, ofcourse).


Linear with d. Assuming that people 'cluster', d << n is reasonable.

4') Lookup sender against all rosters in the system, and detirmine whichof the resulting potential recipients is online and available to thesender. It seems reasonable that it would be possible to limit thelookup to only contacts who are online and available - assuming weignore privacy lists - but you're still adding a lookup and the


This depends on how you do the lookup imo.

associating lookup mechanics (like a hash table or something) which mustbe maintained in-memory. I strongly suspect that this much, much morecostly to use, build, and maintain than 4 above, hence E4' >> E4. I


If you are neglecting privacy lists above certainly.

I think you can reduce this to E4' = n*O(y).

believe this to be possible to implement as O(log(y)), but I alsosuspect it's overwhemlingly more likely that it's O(y) (as derived froma reverse roster lookup O(log(y)) combined with a resource-broadcastlookup O(y)).
This give E' as O(n) + O(d) + O(y), a linear compexity algorithm. Poofgoes your argument above about linear complexity versus constantamortized time.
I can't see (E3' - E3) < ((E4' - E4) + (E2' - E2)) as being at alllikely, so I continue to maintain that this is a false optimization.


I disagree. But at least your argument is technically founded and better
than what I have heard from the council in 2006 (and I had to gather
that from the chatroom).

I care about effective flood control mechanisms. This is about moving
cost where they are affordable.

(I also continue to maintain that this is an arse-clenchingly more complexsolution to the problem of getting presence from one contact to another,but I hope nobody's arguing against me, there).


It is a different solution. One that has been used for years on IRC.

p

p.s.: I still want a review for that 220-rewrite from you.

Re: [Standards] Presence distribution

Reply via email to