Re[2]: (not LONGISH any more) Re[2]: Fwd: Bug (maybe wrong understanding of RFCs): an encoding selected by the user sometimes silently replaced with 7-bit US-ASCII

Maksym Kozub Sun, 25 Jan 2004 13:08:44 -0800

Hello Thomas,

On Sun, 25 Jan 2004 20:41:42 you wrote:


TF> OK, let's read on:

>> Alexandr Kiselev, administrator, Dec 02, 2003, 07:28:24 pm:
>> --------------------------------------------------------------------------------
>> "If all characters in a message are us-ascii, then Bat has been always
>> putting us-ascii in message headers, irrespective of the default
>> encoding. This is by the way in complete accordance with the letter
>> and spirit of RFCs.

TF> The others confirm that's so in the RFCs.

In fact, that's _not_ what the RFCs say. See below.

>> I would recommend to replace one of Latin "a"'s with a Russian "а" -
>> this would be sufficient to cope with your problem.

TF> Here you have a work-around that should solve your problem for the
TF> time being.

TF> But having read the discussion you kindly translated, I don't consider
TF> it a bug in TB. Because TB behauves RFC-conform (if what was said in
TF> the thread from the forum is true, I didn't check it).

That workaround is needed _only_ because The Bat! misinterprets the
RFCs (see below). That's an important part of my whole point.

TF> The work-around is therefore for an RFC, which I - as you - think
TF> should be altered. The correct way is to write to the author of the
TF> RFC rather than asking Ritlabs to violate it. I believe they have a
TF> right to be proud of their RFC-compliance. If an RFC doesn't make
TF> sense, it ought to be changed rather than ignored. IMHO.

The whole matter is this: I _don't_ think that RFC should be altered,
but I would like to take the liberty to say there is a wrong
understanding shown by The Bat!, and by the discussion participants,
of what _is_ said in that RFC. Let's have one more look at RFC2045
now. What does it say? It says:

"2.7. 7bit Data
"7bit data" refers to data that is all represented as relatively short
lines with 998 octets or less between CRLF line separation sequences
[RFC-821]. No octets with decimal values greater than 127 are allowed
and neither are NULs (octets with decimal value 0). CR (decimal value
13) and LF (decimal value 10) octets only occur as part of CRLF line
separation sequences.
2.8. 8bit Data
"8bit data" refers to data that is all represented as relatively short
lines with 998 octets or less between CRLF line separation sequences
[RFC-821]), but octets with decimal values greater than 127 may be
used. As with "7bit data" CR and LF octets only occur as part of CRLF
line separation sequences and no NULs are allowed."

To keep it short: "7bit data should _never_ ever contain 127 and up.
8bit data _may_ contain 127 and up."

Does it say "8bit data _should always_ contain 127 and up"? _No_.

Does it say "Whatever does not contain 127 and up _is always_ 7bit
data"? _No_.

When I type the Latin letter "t", am I typing "7bit data"? _Not
necessarily_. It may be represented as 7-bit, 8-bit, UTF-7, UTF-8...
Of course, if there is a Russian character in the same message, then
the message cannot be encoded as US-ASCII anymore - see RFC2045 above.
However, if there is nothing but low ASCII in that message, - please
show me why, based on the definitions from RFC2045 quoted above, it
cannot be encoded as 8bit KOI8-R, or Win-1252, or UTF-8...

Hope you get my point. Any high ASCII letter can never be 7bit data, -
that's right, and that's what RFC2045 says. What it does _not_ say is
that low ASCII (like the Latin letter "t") is intrinsically bound to
be represented as US-ASCII, and _not_ as KOI8-R or even UTF-8, for
that matter.

And I think it is not by chance. The RFC creator understood it very
clearly that if a character (like that poor "t" :) ) exists in various
encodings, then it can be encoded in any of those. Period.

Regards,
Maksym.

-- 
Maksym Kozub, MK881-UANIC    mailto: [EMAIL PROTECTED]


________________________________________________
Current version is 2.02.3 CE | "Using TBUDL" information:
http://www.silverstones.com/thebat/TBUDLInfo.html

Re[2]: (not LONGISH any more) Re[2]: Fwd: Bug (maybe wrong understanding of RFCs): an encoding selected by the user sometimes silently replaced with 7-bit US-ASCII

Reply via email to