Hi,

On 2009/09/17, at 19:56, Waqas Hussain wrote:
---
The problem: The only thing the current caps XEP adds over legacy caps is obfuscation. Everything else is equivalent from a security perspective.

Not true. The ver attribute value in the presence is derived in a (hopefully) secure way from the response you will get of a disco#info.

This link between ver and the disco response allows for better caching opportunities without the risks of cache poisoning from malicious clients.


There are four directions to consider.

2. A new algorithm in a new XEP (with a new namespace).

This is my preferred direction. There is some opposition to it, which I can understand, but do not entirely agree with. A new XEP would be 'clean', with none of the historical baggage the current XEP already has (which would only increase with the other options). And (most importantly) the transition to a new XEP would be smooth.

I'll come back to this later on.


3. A new (not backwards compatible) algorithm in the same XEP.

I'm assuming this means implementations using a different value for the 'hash' attribute than 'sha-1'. I would be against this, the reason being incompatibility. While the transition from the old to the new takes place, the amount of IQ traffic may very well exceed that of presence. This is probably the worst of all choices, with all the disadvantages of a new replacement XEP, and none of its advantages.

I see no value in this option either.


4. Patching the current algorithm (while keeping it backwards compatible).

First, there's the argument that we already bent over backwards to keep backwards compatibility during the move to the current caps algorithm. It’s a sunk cost, and while that’s sad, it should have zero effect on the current decision.

I'll discuss the solutions proposed, and problems with them.

a. 
http://svn.xmpp.org:18080/browse/XMPP/trunk/extensions/xep-0115.xml?r2=3373&r1=2855

I disagree with the wording and the change. The XML parser sees '&lt;', but anything behind the XML parser sees '<'. The wording should talk about escaping what they have. Not not escaping what they don't have (yay, triple negatives). Also, with the current wording, both &amp;lt; and &lt; would have the same representation (as Pedro Melo noted). Escaping using all five basic XML entities is a more proper fix.

Yes, I don't like this either, as stated. I think that whatever solution we come up with, it should use the values of attributes and node texts as you get them from a XML parser. For example, is someone annouces a feature 'my:little:<feature>', the XML encoding requirements should not matter to the algorithm. It should use 'my:little:<feature>' as input.


b. Forbid empty 'type' attribute on the <identity/> element.

  <feature var='http://jabber.org/protocol/muc'/>

can still be replaced by

<identity category='http:' type='/jabber.org' xml:lang='protocol' name='muc'/>

Fun, eh?

Yes, it does not solve the problem. And if we decide to forbid empty 'type's, it should be done in XEP-0030 because it makes little sense semantically.

c. add new feature that sorts at end.

Show me such a feature, and I'll show you one sorting after it.

d. A special extension with FORM_TYPE value '!'

There are an infinite number of strings which are allowed and sort before '!'.

These two sound weak workarounds as I said previously.


---
Now let’s talk about possible solutions. I should note that while I'm listing these possibilities, they all bring one word to my mind: HACK.

:)


We have three major sections (identities, features and extensions). There needs to be a separator between them. There is no possible string which would always sort last. There is a string which would always sort first, and that is the empty string "".

I fail to see the need of sort order between the sections. Inside each section, sure, but between each section, I don't see the need of order.

I do see the need of framing between sections so that one section cannot influence the other in the final result.


So, with a <feature var=''/> and a form with an empty FORM_TYPE value added, we have separation between the sections.

I used NUL as the separator between sections because it is an invalid XML character. The usage of an empty feature or FORM_TYPE will only duplicate the > separator, so the result is similar: add another separator between sections.


With '<' escaped, the features section is safe.

With '/' not forbidden or escaped, the identity section is not safe. With some exceptions, any <identity/> element with one of the attribute values containing a '/' is open to attack.

If we use invalid XML characters, we don't need to worry about safe or unsafe characters inside the text values.

If you don't like NUL use another character, or even multiple characters, but I fail to see why would we use a valid character as the separator. It only complicates the process, because now we have to encode the texts before using.


Now we are left with the service discovery extensions section. This section happens to have subsections,

When you mention subsections, we are referring to having multiple jabber:x:data elements, each one with a different FORM_TYPE, right?


which are open to attack. We'll have to add a special field to every extension, so that the boundaries between extensions are preserved. Ah, of course, there also need to be boundaries between individual fields of an extension. And of course, these boundaries at three levels need to be distinguishable. Nice, no?

I need to check but I'm sure we have more invalid XML characters. Or we can use \0{size} where size is int32 network-order integer with the size of the segment that follows.

As long as the separator is a sequence that will not occur in the text, and we have no optional parts (empty values or missing attributes are represented not as missing parts but empty ones, we are safe.


The way I see it, there is no simple way to preserve structural information for the extensions section. It's several levels deep, which makes things _very_ complicated. I don't think this can reasonably be done while maintaining backwards compatibility.

The replacement algorithm Pedro Melo suggested still has many of these issues.

My suggestion can be improved I'm sure.

I don't expect to keep backwards compatibility. I see breaking backwards compatibility with 1.5 a feature. Let me explain.

If we release 1.6 in the future with a different algorithm to generate the ver attribute, one that we find secure against the problems described, we will break compatibility with clients that already use 1.5.

Lets examine the scenarios: two clients A and B. The scenarios where both use the same version are irrelevant, they are compatible. I also don't consider legacy caps because that is already explained in the current version and I don't think it needs change.

 a) A 1.5 vs B 1.6

When a 1.6 client receives the caps presence from a 1.5 client, it will calculate the hash using the new algorithm and the verification will fail. The fallback for that purpose is not to cache the information but use the received information (as specified in 0115 5.4.3.9).

This is a good thing because 1.5 is insecure so a new client should not use the hash and cache it, to prevent poisoning.

b) A 1.6 vs B 1.5

In the opposite scenario, the old client using 1.5 will also fail verification and will not cache the new value, and it will use the disco#info information returned.

Same solution.


So on both cases no poisoning will occur, and the worst case is no caching and extra traffic.


---
Now, moving on to incompatible changes:

Many of us don't like service discovery extensions. I think this is due to the problems they are causing in the caps algorithm. While I have agreed with making them obsolete in the past, they have valid use-cases. They should be fixed (or replaced by disco#meta).

They have valid use cases, and until there is something better, they are here to stay.

You could write 115 ignoring service extensions and therefore only cache features and identities. And if your application really requires the service extensions part, fall back to a disco#info query.

But I think that it is possible to implement 115 including service extensions. We only need to come up with a canonical representation for the information contained.


So, and picking up the topic of a new XEP, I do think that changing the gen-ver algorithm and releasing a 1.6 version is the best solution here. The incompatibilities with 1.5 are a good thing, as they prevent old clients to poison the new clients cache.

A new XEP will have to solve the same problems we have here. You would still have to come up with a canonical encoding for all the information contained in the disco#info response. Unless there is a new very radical different approach to disco#info caching, I think we are well served with 115 if we just fix the gen-ver.

Best regards,

Reply via email to