Doug Ewell writes:
> Yes, you can take SCSU- or BOCU-1-encoded text and recompress it using a
> GP compression scheme.  Atkin and Stansifer's paper from last year is
> all about that, and I spend a few pages on it in my paper as well.  You
> can also re-Zip a Zip file, though, so I don't know what that proves
> about the compression formats.

Compressors are characterized by their capability of recompressing if needed
their output. But you can't recompress the output of SCSU or BOCU-1 simply
because their output is not a stream of code points but a stream of byte.
The best you can do is to regenerate the codepoints but this would mean
decompressing and recompressing. There's no interest to do so with SCSU and
BOCU-1, as there's no guarantee that your de/re-compression will be better
or worse or even fully identical to the initial compressed format...

So SCSU and BOCU-* formats are NOT general purpose compressors. As they are
defined only in terms of stream of Unicode code points, they are assumed to
follow the conformance clauses of Unicode. As they recognize their input as
Unicode text, they can recognize canonical equivalence, and thus this
creates an opportunity for them to consider if a (de)normalization or
de/re-composition would result in higher compression (interestingly, the
composition exclusion could be reconsidered in the case of BOCU-1 and SCSU
compressed streams, provided that the decompression to code points will
redecompose the excluded compositions).


__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com

<<attachment: winmail.dat>>

Reply via email to