> > > In 38b4 you introduce a bug encoding utf-8 byte order mark using > > > AddPartHTML method: > > BOM is not added in 38b4, it exists in a few releases before. > Please reread quoted text. As wrote: 'In 38b4 you introduce a bug...'
Well, I am stupid... what bug with BOM I am introduce in 38b4? > > Any unicode document can have BOM in any place! > This is not correct. Essencial Byte Order Mark FAQ from www.unicode.org: BOM is char U+FEFF, what is legal unicode character "ZERO WIDTH NO-BREAK SPACE" and can be preset in any place of unicode stream. (Even on begin is interpreted as BOM, and in middle it should not be used by newer unicode revisions, it is still legal unicode character!) > > And presence of BOM cannot break any correctly written unicode reader. > > Unicode reader should be written correctly, unfortunately that is not > always the case. As such, it can be broken quite easely. With presence of > BOM and charset as well, decoding can be easy missguided. For example > charset may be be in Windows-1251, however raw encoding in utf-8 or 16 > depending on BOM... How this will be possible? If encoding is UTF-8, then it cannot be CP1251... BOM is used for unicode encodings only. If BOM is in the data, then mime headers says correct unicode encoding too. It is just duplicate identification of UTF-8 or BOM for UCS encodings. It can be breaked by other signicifant errors only. > That mean, that charset in presence of BOM is not relevant at all. > However, charset exists and that is primary encoder/decoder guide. Right, if your MIME decoder decode part content by MIME headers, then you got correct content. You see problem where not exists! > > You must ask by reverse question: is here RFC what says: "you cannot use > > BOM in MIME part content"? > > It is quite unlogical explicitly use BOM if charset cleraly say which > unicode encoder is used, as wrote upper. It is logical, and RFC-3629 allows BOM usage especially in MIME case! And it is logical, because BOM is self-description charset information in the datas. It is very useful information. > > BTW: BOM for UTF-8 has been added long time ago for stupid Outlook, what > > ignoring charset information in part headers in some cases and detecting > > UTF-8 by BOM presence. > Outlook should not be a refference e-client at all. It is probably one of > the most buggy e-clients. Agree. However when I can made a modification what not break any RFC and this modification can help to display messages by buggy software, why I cannot use this modification? -- Lukas Gebauer. E-mail: [EMAIL PROTECTED] WEB: http://www.ararat.cz/synapse - Synapse Delphi and Kylix TCP/IP Library ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ synalist-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/synalist-public
