On Oct 30, 2008, at 11:14 AM, Peter Saint-Andre wrote:

Warning: this message might open a big can of worms. :)


Note: This email is UTF-8 and uses UTF-8 characters. If you can't read
them, get a better email client.

Stringprep ties us to Unicode 3.2.

I think it important to move to preparation algorithm which don't tie implementations to a particular version of Unicode while maintaining compatibility with existing algorithms when Unicode 3.2 is used.

One approach is to redefine the algorithm in terms of Unicode properties (to the degree possible). This is the approach being used in IDNAbis, and what I'm hoping to do for SASLprep and LDAPprep algorithms.

I'd be willing to assist in doing the same for various XMPP prep algorithms.

-- Kurt



Right now, RFC 3920 describes the "how" of stringprep (see RFC 3454)
along with the "what" (JIDs), but does not specify exactly who is
responsible for prepping JIDs, when in the communications process JIDs
need to be prepped, and where in XMPP data JIDs need to be prepped.

We need to make this clear in rfc3920bis and rfc3921bis (and perhaps
some XEPs as well).

As a reminder: stringprep handles case folding and other such issues
related to the characters used in JIDs. We don't use the term "case
sensitive" anymore because that applies only to US-ASCII. Thus in ASCII
we can say that if characters are case-insensitive then "P" is changed
to "p", but in UTF-8 we would say that        Π is case-folded to π.

Currently RFC 3920 says it is the server's responsibility to ensure that a JID is prepped on login. So if my JID is [EMAIL PROTECTED] but I try to
log in as       [EMAIL PROTECTED] with a capital-pi, my client could provide
capital-pi but my server must prep that to small-pi and return πßå to me
during SASL negotiation or resource binding (depending on when the
server tells me what my JID is).

However, the XMPP specs do not talk about stringprep enforcement in any
other context.

Consider the following scenarios:

1. I send a message to [EMAIL PROTECTED]
2. I send a presence subscription request to [EMAIL PROTECTED]
3. I join the [EMAIL PROTECTED] chatroom
4. I add [EMAIL PROTECTED] to my roster
5. I add [EMAIL PROTECTED] to a privacy list

In each case, is it my client's responsibility to ensure that the
address is stringprepped before generating the appropriate stanza? And
what happens if my client sends capital-pi instead of small-pi (which a lot of clients probably do now)? And what if I think my contact's JID is
[EMAIL PROTECTED] (with a capital-pi) but I receive a message from
[EMAIL PROTECTED] (with a small-pi) -- will my client ignore the message or
think that it is from someone else?

(Yes, there are some similarities here to the well-formedness discussion...)

One approach is to say that my server must inspect all of my outbound
traffic to ensure that the 'to' addresses are stringprepped. This solves
cases 1, 2, and 3, but not cases 4 or 5 (because when I add a roster
item or modify a privacy list, my contact's JID is not in the 'to'
address but instead is buried in the XML). However, we could also
specify that my server must stringprep all JIDs that I might add to my
roster, store in a privacy list, or otherwise ask the server to process or store on my behalf (e.g., in bookmarks). This rule would not apply to
JIDs that are communicated to others directly, for example in roster
item exchange, XHTML-IM, or a plain old message body.

As far as I can see, this is the best way to proceed, but we'll need to
find a list consensus before I can make any changes in the text of
rfc3920bis (or other specs).

Peter

--
Peter Saint-Andre
https://stpeter.im/


Reply via email to