As a by-product of our recent work on collation, we developed a method of
Unicode compression that is similar to SCSU, in that small alphabets are
about a byte per character and large alphabets are about two bytes per
character.
The main difference from SCSU is that this method preserves
The main difference from SCSU is that this method preserves binary order.
Ah. And which binary order does it preserve?
The right one, or the other one? ;-)
Rick
Kenneth Whistler wrote:
Plane 14 PUA usage description tags? Naaah, nobody would suggest such
a bizarre thing, would they?
Marco Cimarosti wrote:
The three words PUA usage description are redundant, methinks. Removing
them leaves a more concise and dramatic example of a weird proposal.
Mark,
This sounds like a great idea. I was wondering however, if spaces in non-plane 0
characters set will cause problems with the compression efficiency. Maybe you should
consider a special case for spaces.
Maybe you could use something like offsetting the displacement values to
It preserves code point binary order (UTF-8 / UTF-32). However, one could
also easily produce a minor variant that preserves UTF-16 binary order as
well.
Mark
- Original Message -
From: Rick McGowan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, May 31, 2001 23:46
Subject: Re:
Thanks for your comments.
We aren't worried about the transition to and from space for supplementary
characters, since as a fraction of all text they will be exceedingly rare (
0.01%, our estimate).
As to Korean, it might save some storage to always reset at space, but
(a) I don't see an
At 4:44 AM -0600 6/1/01, Bill Kurmey wrote:
Kenneth Whistler wrote:
Plane 14 PUA usage description tags? Naaah, nobody would suggest such
a bizarre thing, would they?
Marco Cimarosti wrote:
The three words PUA usage description are redundant, methinks. Removing
them leaves a more concise and
Mark,
Shades of UTF-8s!!! Please don't provide UTF-16 binary order.
Carl
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Mark Davis
Sent: Friday, June 01, 2001 9:00 AM
To: [EMAIL PROTECTED]; Rick McGowan
Subject: Re: Compression - binary ordered
It
I wrote a couple of programs for a Control Data Corporation (CDC) 6600 back
in the early '70s. I recall that the smallest addressable unit was a 60 bit
word (though there were special instructions to pack and unpack some size
of character -- was it 6 bit?)
Bob
Correct, except that there
The Asia/East Asian/CJK thread reminded me of one of my own pet peeves,
the use of 'ideograph' to refer to kanji.
Perhaps some of the professionals on this list can enlighten me here. I
thought that an ideograph meant that the graph stood for an idea, not a
sound or a zographic image. Since
At 4:16 PM -0600 6/1/01, Jon Babcock wrote:
The Asia/East Asian/CJK thread reminded me of one of my own pet
peeves, the use of 'ideograph' to refer to kanji.
Perhaps some of the professionals on this list can enlighten me
here. I thought that an ideograph meant that the graph stood for an
So does my Rurouni Kensin album go under R or under ru?
Maybe ru is better because few words start with ru.
★じゅういっちゃん★
"AIS TSXQ QDOO TD AISC TDQMIG, HYCTDL,
ZIC HIIUPLB XSHM GDOPHPISX CYTDL."
"QMD XDHCDQ, AIS XDD,
PX QMDCD'X LI CDHPWD.
P VSXQ WSQ RMYQ P MYED KA TA YCT PL."
12 matches
Mail list logo