Doug Ewell writes: > Yes, you can take SCSU- or BOCU-1-encoded text and recompress it using a > GP compression scheme. Atkin and Stansifer's paper from last year is > all about that, and I spend a few pages on it in my paper as well. You > can also re-Zip a Zip file, though, so I don't know what that proves > about the compression formats.
Compressors are characterized by their capability of recompressing if needed their output. But you can't recompress the output of SCSU or BOCU-1 simply because their output is not a stream of code points but a stream of byte. The best you can do is to regenerate the codepoints but this would mean decompressing and recompressing. There's no interest to do so with SCSU and BOCU-1, as there's no guarantee that your de/re-compression will be better or worse or even fully identical to the initial compressed format... So SCSU and BOCU-* formats are NOT general purpose compressors. As they are defined only in terms of stream of Unicode code points, they are assumed to follow the conformance clauses of Unicode. As they recognize their input as Unicode text, they can recognize canonical equivalence, and thus this creates an opportunity for them to consider if a (de)normalization or de/re-composition would result in higher compression (interestingly, the composition exclusion could be reconsidered in the case of BOCU-1 and SCSU compressed streams, provided that the decompression to code points will redecompose the excluded compositions). __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>