- Original Message -
From: "Frank Yung-Fong Tang" <[EMAIL PROTECTED]>
> > >> UTF-166,634,430 bytes
> > >> UTF-87,637,601 bytes
> > >> SCSU6,414,319 bytes
> > >> BOCU-15,897,258 bytes
> > >> Legacy encoding (*)5,477,432 bytes
> > >> (*) KS C 5601, KS X 1001, or
Frank Yung-Fong Tang wrote:
>> UTF-166,634,430 bytes
>> UTF-87,637,601 bytes
>> SCSU6,414,319 bytes
>> BOCU-15,897,258 bytes
>> Legacy encoding (*)5,477,432 bytes
>> (*) KS C 5601, KS X 1001, or EUC-KR)
>
> What is the size of gzip these? Just wonder
> gzip of UTF-16
> gzi
uot; <[EMAIL PROTECTED]>; "Unicode Mailing List"
<[EMAIL PROTECTED]>; "Jungshik Shin" <[EMAIL PROTECTED]>; "John Cowan"
<[EMAIL PROTECTED]>
Sent: Tue, 2003 Dec 02 15:03
Subject: Re: Korean compression (was: Re: Ternary search trees for Unicode
d
Mark Davis wrote:
> > >> UTF-166,634,430 bytes
> > >> UTF-87,637,601 bytes
> > >> SCSU6,414,319 bytes
> > >> BOCU-15,897,258 bytes
> > >> Legacy encoding (*)5,477,432 bytes
> > >> (*) KS C 5601, KS X 1001, or EUC-KR)
What is the size of gzip these? Just wonder
gzip
Philippe Verdy wrote:
> The question of Latin letters with two diacritics added in Latin
> Extension B does not seem to respect this constraint, as it is not
> justifed in the Vietnames VISCII standard that already does not
> contain characters with two diacritics, but already composes them
> wit
Philippe Verdy scripsit:
> The question of Latin letters with two diacritics added in Latin Extension B
> does not seem to respect this constraint, as it is not justifed in the
> Vietnames VISCII standard that already does not contain characters with two
> diacritics, but already composes them wit
John Cowan writes:
> > You are, because the floodgates, while once open, have been closed by
> > normalization.
>
> Indeed, they were opened in Unicode 1.1, as a result of the merger with
> FDIS 10646; since then, only 46 characters with canonical decompositions
> have been added to Unicode (exce
At 08:23 -0500 2003-11-25, John Cowan wrote:
Michael Everson scripsit:
Ridiculous. This happened centuries ago, and it is not "why" Ethiopic
was encoded as a syllabary. It was encoded as a syllabary because it
is a syllabary.
Structurally it's an abugida, like Indic and UCAS.
I disagree. And I
Michael Everson scripsit:
> Ridiculous. This happened centuries ago, and it is not "why" Ethiopic
> was encoded as a syllabary. It was encoded as a syllabary because it
> is a syllabary.
Structurally it's an abugida, like Indic and UCAS.
> You are, because the floodgates, while once open, have
On 25/11/2003 03:54, Michael Everson wrote:
At 03:41 -0800 2003-11-25, Peter Kirk wrote:
...
But the floodgates have already been opened - not just Ethiopic but
Greek extended, much of Latin extended, the Korean syllables which
started this discussion, the small amount of precomposed Hebrew wh
At 03:41 -0800 2003-11-25, Peter Kirk wrote:
After all, Ethiopic was encoded as a syllabary just because the
vowel points happen to have become attached to the base characters.
Ridiculous. This happened centuries ago, and it is not "why" Ethiopic
was encoded as a syllabary. It was encoded as a s
On 24/11/2003 17:56, Christopher John Fynn wrote:
"Peter Kirk" <[EMAIL PROTECTED]> wrote:
This approach would certainly have simplified pointed Hebrew a lot, so
much so that it could well be serious. After all, Ethiopic was encoded
as a syllabary just because the vowel points happen to have be
Christopher John Fynn wrote:
> "Peter Kirk" <[EMAIL PROTECTED]> wrote:
>
> > This approach would certainly have simplified pointed Hebrew a lot, so
> > much so that it could well be serious. After all, Ethiopic was encoded
> > as a syllabary just because the vowel points happen to have become
> >
On 11/24/03 20:56, Christopher John Fynn wrote:
"Peter Kirk" <[EMAIL PROTECTED]> wrote:
This approach would certainly have simplified pointed Hebrew a lot, so
much so that it could well be serious. After all, Ethiopic was encoded
as a syllabary just because the vowel points happen to have becom
"Peter Kirk" <[EMAIL PROTECTED]> wrote:
> This approach would certainly have simplified pointed Hebrew a lot, so
> much so that it could well be serious. After all, Ethiopic was encoded
> as a syllabary just because the vowel points happen to have become
> attached to the base characters. And w
Peter Kirk scripsit:
> This approach would certainly have simplified pointed Hebrew a lot, so
> much so that it could well be serious.
There are an awful lot of possibilities, and it's not clear that spinning
them out a la Hangul really makes sense.
> After all, Ethiopic was encoded
> as a syll
Kent Karlsson wrote:
> Hangul syllables are "LVT" (actually (L+)(V+)(T*)), not TLV.
Sorry, I use so often the acronym TLV which means in French "Type, Longueur,
Valeur" (and is completely unrelated to Unicode or Hangul syllable types),
that this often confuses me with the English LVT for "Leading
On 24/11/2003 03:29, Kent Karlsson wrote:
...
I wonder why Hangul would need compression over and above
any other alphabetic script... It has already quite a lot of compression
in the form of precomposed syllables. I think we better start a project
for allocating precomposed "syllables" for many
...
> >> Of course, no compression format applied to jamos could
> >> even do as well as UTF-16 applied to syllables, i.e. 2 bytes per
> >> syllable.
I wonder why Hangul would need compression over and above
any other alphabetic script... It has already quite a lot of compression
in the form of p
Mark Davis wrote:
>> Of course, no compression format applied to jamos could
>> even do as well as UTF-16 applied to syllables, i.e. 2 bytes per
>> syllable.
>
> This needs a bit of qualification. An arithmetic compression would do
> better, for example, or even just a compression that took the m
>Of course, no compression format applied to jamos could
> even do as well as UTF-16 applied to syllables, i.e. 2 bytes per
> syllable.
This needs a bit of qualification. An arithmetic compression would do better,
for example, or even just a compression that took the most frequent jamo
sequences.
21 matches
Mail list logo