Re: [Synalist] UTF-8 BOM usung AddPartHTML

Lukas Gebauer Mon, 26 Feb 2007 11:56:52 -0800

> > > In 38b4 you introduce a bug encoding utf-8 byte order mark using
> > > AddPartHTML method:
> > BOM is not added in 38b4, it exists in a few releases before.
> Please reread quoted text. As wrote: 'In 38b4 you introduce a bug...'


Well, I am stupid... what bug with BOM I am introduce in 38b4?

> > Any unicode document can have BOM in any place! 
> This is not correct. Essencial Byte Order Mark FAQ from www.unicode.org: 

BOM is char U+FEFF, what is legal unicode character "ZERO WIDTH NO-BREAK 
SPACE" and can be preset in any place of unicode stream. (Even on begin 
is interpreted as BOM, and in middle it should not be used by newer 
unicode revisions, it is still legal unicode character!)

> > And presence of BOM cannot break any correctly written unicode reader.
> 
> Unicode reader should be written correctly, unfortunately that is not
> always  the case. As such, it can be broken quite easely. With presence of
> BOM and charset as well, decoding  can be easy missguided. For example
> charset may be be in Windows-1251, however raw encoding in utf-8 or 16 
> depending on BOM...

How this will be possible? If encoding is UTF-8, then it cannot be 
CP1251... BOM is used for unicode encodings only. 

If BOM is in the data, then mime headers says correct unicode encoding 
too. It is just duplicate identification of UTF-8 or BOM for UCS 
encodings. It can be breaked by other signicifant errors only.

> That mean, that charset in presence of BOM is not relevant at all.
> However, charset exists and that is primary encoder/decoder guide. 

Right, if your MIME decoder decode part content by MIME headers, then you 
got correct content. You see problem where not exists!

> > You must ask by reverse question: is here RFC what says: "you cannot use
> > BOM in MIME part content"?
> 
> It is quite unlogical explicitly use BOM if charset cleraly say which
> unicode encoder is used, as wrote upper.

It is logical, and RFC-3629 allows BOM usage especially in MIME case! And 
it is logical, because BOM is self-description charset information in the 
datas. It is very useful information.

> > BTW: BOM for UTF-8 has been added long time ago for stupid Outlook, what
> > ignoring charset information in part headers in some cases and detecting
> > UTF-8 by BOM presence.
> Outlook should not be a refference e-client at all. It is probably one of
> the most buggy e-clients.

Agree. However when I can made a modification what not break any RFC and 
this modification can help to display messages by buggy software, why I 
cannot use this modification?


--
Lukas Gebauer.

E-mail: [EMAIL PROTECTED]
WEB: http://www.ararat.cz/synapse - Synapse Delphi and Kylix TCP/IP 
Library



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
synalist-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/synalist-public

Re: [Synalist] UTF-8 BOM usung AddPartHTML

Reply via email to