Doug Ewell writes: > * Philippe Verdy and and Jill Ramonsky say YES, a compressor can > normalize, because it knows it is operating on Unicode character data > and can take advantage of Unicode properties.
I say YES only for compressors that are supposed to work on Unicode text (this applies to BOCU-1 and SCSU which are not intented to compress anything else than Unicode text), but NO of course for general purpose compressors (like deflate in zip files.) I will say NO for encoding forms that are normally built to be directly parsable code point by codepoint in any direction and from random locations in strings. So a UTF encoding scheme is not supposed to change the normalization form. > * Peter Kirk and Mark Shoulson say NO, it can't, because all the > compressor really knows about is the byte stream, so it must be > preserved byte-for-byte. But SCSU and BOCU-1 do not operate in the byte stream level, as their use is invalid on random streams of bytes, but only defined in terms of streams of code units... That's why I won't say that SCSU and BOCU-1 are really compressors, but rather really encoding schemes (CES in the ISO10646 terminology). In fact the result of BOCU-1 and SCSU encoding schemes can create a file which has its own charset (i.e. CCS+CES in the ISO terminology), and thus can also have its own label for MIME usage or in XML charset declarations. This is not a limitation, as true compressors can still be used if needed from this encoding scheme, or transparently within transport layers (such as the "Content-Transfer-Encoding:" in MIME and HTTP applications). > * I'm still not sure, but I'm leaning toward NO. __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>