> [Original Message]
> From: Tom Emerson <[EMAIL PROTECTED]>
> To: Gary P. Grosso <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Date: 4/21/2004 12:58:38 PM
> Subject: Re: Unihan.txt and other possible representations of the data
>
> Gary P. Grosso writes:
> > There may be value in an HTML representation, utilizing links
> > and multiple files. What would the logical division(s) be?
> > Or has this already been done?
>
> I'm working on a proposal for generating different representations of
> Unihan, and this includes logical divisions. I'll post a draft when I
> have something ready.
The obvious division is to put the dictionary stuff in one document
(or group of documents) and to put the encoding equivalencies in
another document, and the numeric information in a third.
However, if backward compatibility could be sacrificed there would
be an easy way to shave 2 MB off the size of Unihan.txt: get rid of
the initial "U+". It may be only 10%, but its an irritating 10% because
it's totally worthless. Altho, removing it wouldn't do much to shave
the size of Unihan.zip, , because since it is so redundant, any good
compression scheme is able to take advantage of it.