Hi,
On 2009/09/17, at 19:56, Waqas Hussain wrote:
---
The problem: The only thing the current caps XEP adds over legacy
caps is obfuscation. Everything else is equivalent from a security
perspective.
Not true. The ver attribute value in the presence is derived in a
(hopefully) secure way from the response you will get of a disco#info.
This link between ver and the disco response allows for better caching
opportunities without the risks of cache poisoning from malicious
clients.
There are four directions to consider.
2. A new algorithm in a new XEP (with a new namespace).
This is my preferred direction. There is some opposition to it,
which I can understand, but do not entirely agree with. A new XEP
would be 'clean', with none of the historical baggage the current
XEP already has (which would only increase with the other options).
And (most importantly) the transition to a new XEP would be smooth.
I'll come back to this later on.
3. A new (not backwards compatible) algorithm in the same XEP.
I'm assuming this means implementations using a different value for
the 'hash' attribute than 'sha-1'. I would be against this, the
reason being incompatibility. While the transition from the old to
the new takes place, the amount of IQ traffic may very well exceed
that of presence. This is probably the worst of all choices, with
all the disadvantages of a new replacement XEP, and none of its
advantages.
I see no value in this option either.
4. Patching the current algorithm (while keeping it backwards
compatible).
First, there's the argument that we already bent over backwards to
keep backwards compatibility during the move to the current caps
algorithm. It’s a sunk cost, and while that’s sad, it should have
zero effect on the current decision.
I'll discuss the solutions proposed, and problems with them.
a.
http://svn.xmpp.org:18080/browse/XMPP/trunk/extensions/xep-0115.xml?r2=3373&r1=2855
I disagree with the wording and the change. The XML parser sees
'<', but anything behind the XML parser sees '<'. The wording
should talk about escaping what they have. Not not escaping what
they don't have (yay, triple negatives). Also, with the current
wording, both &lt; and < would have the same representation
(as Pedro Melo noted). Escaping using all five basic XML entities is
a more proper fix.
Yes, I don't like this either, as stated. I think that whatever
solution we come up with, it should use the values of attributes and
node texts as you get them from a XML parser. For example, is someone
annouces a feature 'my:little:<feature>', the XML encoding
requirements should not matter to the algorithm. It should use
'my:little:<feature>' as input.
b. Forbid empty 'type' attribute on the <identity/> element.
<feature var='http://jabber.org/protocol/muc'/>
can still be replaced by
<identity category='http:' type='/jabber.org' xml:lang='protocol'
name='muc'/>
Fun, eh?
Yes, it does not solve the problem. And if we decide to forbid empty
'type's, it should be done in XEP-0030 because it makes little sense
semantically.
c. add new feature that sorts at end.
Show me such a feature, and I'll show you one sorting after it.
d. A special extension with FORM_TYPE value '!'
There are an infinite number of strings which are allowed and sort
before '!'.
These two sound weak workarounds as I said previously.
---
Now let’s talk about possible solutions. I should note that while
I'm listing these possibilities, they all bring one word to my mind:
HACK.
:)
We have three major sections (identities, features and extensions).
There needs to be a separator between them. There is no possible
string which would always sort last. There is a string which would
always sort first, and that is the empty string "".
I fail to see the need of sort order between the sections. Inside each
section, sure, but between each section, I don't see the need of order.
I do see the need of framing between sections so that one section
cannot influence the other in the final result.
So, with a <feature var=''/> and a form with an empty FORM_TYPE
value added, we have separation between the sections.
I used NUL as the separator between sections because it is an invalid
XML character. The usage of an empty feature or FORM_TYPE will only
duplicate the > separator, so the result is similar: add another
separator between sections.
With '<' escaped, the features section is safe.
With '/' not forbidden or escaped, the identity section is not safe.
With some exceptions, any <identity/> element with one of the
attribute values containing a '/' is open to attack.
If we use invalid XML characters, we don't need to worry about safe or
unsafe characters inside the text values.
If you don't like NUL use another character, or even multiple
characters, but I fail to see why would we use a valid character as
the separator. It only complicates the process, because now we have to
encode the texts before using.
Now we are left with the service discovery extensions section. This
section happens to have subsections,
When you mention subsections, we are referring to having multiple jabber:x:data
elements, each one with a different FORM_TYPE, right?
which are open to attack. We'll have to add a special field to every
extension, so that the boundaries between extensions are preserved.
Ah, of course, there also need to be boundaries between individual
fields of an extension. And of course, these boundaries at three
levels need to be distinguishable. Nice, no?
I need to check but I'm sure we have more invalid XML characters. Or
we can use \0{size} where size is int32 network-order integer with the
size of the segment that follows.
As long as the separator is a sequence that will not occur in the
text, and we have no optional parts (empty values or missing
attributes are represented not as missing parts but empty ones, we are
safe.
The way I see it, there is no simple way to preserve structural
information for the extensions section. It's several levels deep,
which makes things _very_ complicated. I don't think this can
reasonably be done while maintaining backwards compatibility.
The replacement algorithm Pedro Melo suggested still has many of
these issues.
My suggestion can be improved I'm sure.
I don't expect to keep backwards compatibility. I see breaking
backwards compatibility with 1.5 a feature. Let me explain.
If we release 1.6 in the future with a different algorithm to generate
the ver attribute, one that we find secure against the problems
described, we will break compatibility with clients that already use
1.5.
Lets examine the scenarios: two clients A and B. The scenarios where
both use the same version are irrelevant, they are compatible. I also
don't consider legacy caps because that is already explained in the
current version and I don't think it needs change.
a) A 1.5 vs B 1.6
When a 1.6 client receives the caps presence from a 1.5 client, it
will calculate the hash using the new algorithm and the verification
will fail. The fallback for that purpose is not to cache the
information but use the received information (as specified in 0115
5.4.3.9).
This is a good thing because 1.5 is insecure so a new client should
not use the hash and cache it, to prevent poisoning.
b) A 1.6 vs B 1.5
In the opposite scenario, the old client using 1.5 will also fail
verification and will not cache the new value, and it will use the
disco#info information returned.
Same solution.
So on both cases no poisoning will occur, and the worst case is no
caching and extra traffic.
---
Now, moving on to incompatible changes:
Many of us don't like service discovery extensions. I think this is
due to the problems they are causing in the caps algorithm. While I
have agreed with making them obsolete in the past, they have valid
use-cases. They should be fixed (or replaced by disco#meta).
They have valid use cases, and until there is something better, they
are here to stay.
You could write 115 ignoring service extensions and therefore only
cache features and identities. And if your application really requires
the service extensions part, fall back to a disco#info query.
But I think that it is possible to implement 115 including service
extensions. We only need to come up with a canonical representation
for the information contained.
So, and picking up the topic of a new XEP, I do think that changing
the gen-ver algorithm and releasing a 1.6 version is the best solution
here. The incompatibilities with 1.5 are a good thing, as they prevent
old clients to poison the new clients cache.
A new XEP will have to solve the same problems we have here. You would
still have to come up with a canonical encoding for all the
information contained in the disco#info response. Unless there is a
new very radical different approach to disco#info caching, I think we
are well served with 115 if we just fix the gen-ver.
Best regards,