From: Robert Sparks <[EMAIL PROTECTED]>

   The BNF in 3261 says the following:

   extension-header  =  header-name HCOLON header-value
   header-value      =  *(TEXT-UTF8char / UTF8-CONT / LWS)

   This is intended to be the catch-all field for all future extensions  
   - older parsers working against this BNF shouldn't barf
   when we introduce a new header field.

   Now, we may have new fields in the future that look like:

   NewHeader = new-header-name HCOLON quoted-string

   And down inside quoted-string, we get:

          quoted-string  =  SWS DQUOTE *(qdtext / quoted-pair ) DQUOTE
          qdtext         =  LWS / %x21 / %x23-5B / %x5D-7E
                            / UTF8-NONASCII
          quoted-pair  =  "\" (%x00-09 / %x0B-0C
                           / %x0E-7F)

The whole situation is rather icky.  I can see five problems:

1. header-value generates solo UTF8-CONT, the extension bytes of UTF-8
characters, which are the range x80-BF.  Why this is so is unclear --
the syntax cannot generate a solo UTF-8 initial byte which would
govern the extension byte, but the syntax also does not admit the
(single-byte) encodings of a lot of the characters in the ISO-8859-*
character sets, so the syntax does not permit embedding the one-byte
ISO-8859 encodings.  It appears to me that the inclusion of UTF8-CONT
in the production is unintended.

2. quoted-string admits (most of) x00-1F even though extension-header
does not.

3. Since quoted-string is used in many defined headers, we are
already in the position of having defined headers that cannot be
parsed as extension-header as a catch-all mechanism.

4. Given that there is no common character encoding within which all
of these productions can be uniformly interpreted, the only overall
description that can be given of the encoding of SIP headers is
"*OCTET".  And yet SIP headers are not intended to be a binary
protocol.

5. In quoted-string, a backslash is permitted to quote any ASCII
character, but not any Unicode character x80 or higher.  (Despite that
the backslash is not used to quote a solo UTF-8 initial byte.)  This
leads to the peculiar result that some letters used in (e.g.) French
can be preceeded by backslash in quoted-string, but others cannot.

Based on RFC 3261 section 7:

   SIP is a text-based protocol and uses the UTF-8 charset (RFC 2279 [7]).

my understanding is that the intention for SIP headers is that they
are sequences of Unicode characters encoded using UTF-8.  I see no
reason to abandon that principle and I've not heard of any instance
where anyone has done so deliberately.

To hold to that principle and clean up the above problems, the BNF
would need to be revised to be:

   extension-header  =  header-name HCOLON header-value
   header-value      =  *(TEXT-UTF8char / LWS)

          quoted-string  =  SWS DQUOTE *(qdtext / quoted-pair ) DQUOTE
          qdtext         =  LWS / %x21 / %x23-5B / %x5D-7E
                            / UTF8-NONASCII
          quoted-pair  =  "\" (%x20-7E / UTF8-NONASCII)

or equivalently

          quoted-pair  =  "\" (SP / TEXT-UTF8char)

Relative to other proposals, in a sense I'm proposing that
extension-header and quoted-string be contracted so that they
coincide, as I think that makes all the rules consistent and
conceptually coherent and does not exclude any current usage.  This
leaves me in the dark why interoperability problems have been seen.
Robert, can you show us some examples?

Dale


_______________________________________________
Sip mailing list  https://www1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use [EMAIL PROTECTED] for questions on current sip
Use [EMAIL PROTECTED] for new developments on the application of sip

Reply via email to