On Mon, Aug 17, 2009 at 10:22:56AM +0200, Peter Rosin wrote: > >> If it is so natural with UTF-8 and if it really is the only sane choise > >> (I think it is), it's enough if our spec says (e.g.) > >> > >> It is strongly recommended that all implementations use > >> UTF-8 for all strings (except explicitely stated otherwise) > >> to ensure interoperability. But be prepared that not all > >> implementation do, so fail gracefully if you receive > >> something else. > >> > >> instead of (e.g.) > >> > >> All implementations MUST use UTF-8 for all strings (except > >> explicitely stated otherwise). But not all implementations > >> do, so you SHOULD fail gracefully if you receive something > >> else. > >> > >> I just don't see why the wording with MUST/SHOULD is so superior > >> that it is worth rendering existing implementations incompatible > >> with our spec. > > > > This is ok with me. I don't think there's any difference in practice. > > Oh, cool. Pierre previously asked if I had any alternative wording, > so here is my suggestion: > > diff --git a/rfbproto.rst b/rfbproto.rst > index 7852746..0252e4f 100644 > --- a/rfbproto.rst > +++ b/rfbproto.rst > @@ -201,6 +201,26 @@ that you contact RealVNC Ltd to make sure that your > encodin security types do not clash. Please see the RealVNC website at > http://www.realvnc.com for details of how to contact them. > > +String Encodings > +================ > + > +It is strongly recommended that strings in RFB are encoded using the > +UTF-8 encoding. This allows full unicode support, yet retains good > +compatibility with older RFB implementations. > + > +The encoding used for strings in the protocol has historically often > +been unspecified, or has changed between versions of the protocol. As a > +result, there are a lot of implementations which use different, > +incompatible encodings. Commonly those encodings have been ISO 8859-1 > +(also known as Latin-1) or Windows code pages. > + > +Clients and servers are encouraged to send UTF-8 strings unless that > +particular part of the protocol mandates another encoding. They should > +however be prepared to receive invalid UTF-8 sequences at all times. > +Such sequences should be handled gracefully by e.g. stripping the > +invalid portions or trying to interpret the string using common > +encodings such as ISO 8859-1 or Windows code page 1252. > +
Hm, it is easy to say "invalid portions of UTF-8" string but it is _very_ hard to create an algorithm which will determine if a part of string is valid or invalid. If you are using UTF-8 users might create strings with "obscure" characters. I think this kind of heuristic should not be included in protocol. If an implementation sends strings in, for example, the ISO 8859-* encoding it will end with crippled characters but we have to live with it, there is probably no algorithm to solve this problem. Regards, Adam -- Adam Tkac, Red Hat, Inc. ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ tigervnc-rfbproto mailing list tigervnc-rfbproto@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-rfbproto