Re: [rfbproto] [PATCH] Specify UTF-8 for strings

Peter Rosin Mon, 17 Aug 2009 01:23:22 -0700

Den 2009-08-17 09:00 skrev Peter Åstrand:
> On Fri, 14 Aug 2009, Peter Rosin wrote:
> 
>>> Besides, didn't we agree on that there is no such server that sends 
>>> strings with the "ANSI CODE PAGE"?
>>
>> No we didn't, we agreed on that for the desktop name.
> 
> Refresh my memory - which other strings are sent as "ANSI CODE PAGE"?


Username and password in the VeNCrypt extension. There are some strings
in the gii extension. The tight file transfer extension sends filenames.
And I'm sure I'm forgetting at least some string, that was just off the
top of my head...

> I thought we came the conclusion that besides the DesktopName, we only 
> have the ProtocolVersion and the reason-string. I'm fine with nailing 
> these to plain ASCII, if this is what you prefer.

Nailing to ASCII is worse than nailing to UTF-8. Both make our spec
incompatible with existing implementations. We have to allow for
implementations to do whatever non-UTF-8 thingy they have been
doing, but still recommend against it.

>> And besides, *clients* are using all kinds of ASCII compatible 
>> encodings, and will happily display whatever they receive using their 
>> selected encoding. If we say "MUST use UTF-8" in our spec we declare 
>> all those clients incompatible, and I for one don't wish to do that. They
> 
> We are not. It's just that clients that relied on recieving the 
> DesktopName in something else than UTF-8 was "on their own" and relied 
> on unspecified protocol behaviour.

Reversing that argument is so easy, Xvnc were "on its own" when it relied
on unspecified protocol behaviour...

>> were "legal" yesterday with the RealVNC spec, and I think they should 
>> be "legal" tomorrow with *both* the RealVNC spec and our spec.
> 
> Strange language. We are not forbidden any clients. It's true that a few 
> clients could theoretically start rendering the names incorrectly, but...

But we seriously do not want to divide the RFB community (any further),
and we are doing that if we say "MUST use UTF-8". With a MUST in there,
anything else is not acceptable, hence "illegal" by our spec.

>> If it is so natural with UTF-8 and if it really is the only sane choise
>> (I think it is), it's enough if our spec says (e.g.)
>>
>>     It is strongly recommended that all implementations use
>>     UTF-8 for all strings (except explicitely stated otherwise)
>>     to ensure interoperability. But be prepared that not all
>>     implementation do, so fail gracefully if you receive
>>     something else.
>>
>> instead of (e.g.)
>>
>>     All implementations MUST use UTF-8 for all strings (except
>>     explicitely stated otherwise). But not all implementations
>>     do, so you SHOULD fail gracefully if you receive something
>>     else.
>>
>> I just don't see why the wording with MUST/SHOULD is so superior
>> that it is worth rendering existing implementations incompatible
>> with our spec.
> 
> This is ok with me. I don't think there's any difference in practice.

Oh, cool. Pierre previously asked if I had any alternative wording,
so here is my suggestion:

diff --git a/rfbproto.rst b/rfbproto.rst
index 7852746..0252e4f 100644
--- a/rfbproto.rst
+++ b/rfbproto.rst
@@ -201,6 +201,26 @@ that you contact RealVNC Ltd to make sure that your 
encodin security types do not clash. Please see the RealVNC website at
  http://www.realvnc.com for details of how to contact them.

+String Encodings
+================
+
+It is strongly recommended that strings in RFB are encoded using the
+UTF-8 encoding. This allows full unicode support, yet retains good
+compatibility with older RFB implementations.
+
+The encoding used for strings in the protocol has historically often
+been unspecified, or has changed between versions of the protocol. As a
+result, there are a lot of implementations which use different,
+incompatible encodings. Commonly those encodings have been ISO 8859-1
+(also known as Latin-1) or Windows code pages.
+
+Clients and servers are encouraged to send UTF-8 strings unless that
+particular part of the protocol mandates another encoding. They should
+however be prepared to receive invalid UTF-8 sequences at all times.
+Such sequences should be handled gracefully by e.g. stripping the
+invalid portions or trying to interpret the string using common
+encodings such as ISO 8859-1 or Windows code page 1252.
+
  Protocol Messages
  =================

@@ -614,7 +634,8 @@ No. of bytes    Type                Description
  *name-length*   ``U8`` array        *name-string*
  =============== =================== ===================================

-where ``PIXEL_FORMAT`` is
+The recommended text encoding for *name-string* is UTF-8 (see the
+`String Encodings`_ section.) ``PIXEL_FORMAT`` is defined as:

  =============== =================== ===================================
  No. of bytes    Type                Description

> Everybody fine with such a wording, and with fixing our clients so that 
> they interpret the strings as UTF-8?

Should not Xvnc also be fixed to explicitely send strings in UTF-8
instead of relying on being executed in a "UTF-8 context" or however
you worded it? I have never looked at the Xvnc code so I wouldn't
know if you can trick it into sending non-UTF-8 strings...

Cheers,
Peter

PS. I appologize for steering the discussion into the "UTF-8 pseudo-
encoding" direction and for taking so long to drop that and focus on
my real issue which was just a wording thing...

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
tigervnc-rfbproto mailing list
tigervnc-rfbproto@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-rfbproto

Re: [rfbproto] [PATCH] Specify UTF-8 for strings

Reply via email to