Probably closing the thread with answers to the questions would have
been better approach, instead of "terminating" so.
Fyi, Sarasvati is Vedic goddess of knowledge & ingenuity, and the
goddess made such knowledge & ingenuity available for others but never
implemented on Her own.
pracoditaa yena
On Wed, Jun 2, 2010 at 21:43, Doug Ewell wrote:
>> If you want a really fast alternate encoding, you could encode all of
>> Unicode in at most 3 bytes. Use the high bit as a "continuation" bit and
>> the lower 7 bits as the data.
>>
>> ASCII gets passed through unchanged.
>
> This is essentially
Michael D'Errico wrote:
If you want a really fast alternate encoding, you could encode all of
Unicode in at most 3 bytes. Use the high bit as a "continuation" bit
and the lower 7 bits as the data.
ASCII gets passed through unchanged.
This is essentially what I was going to suggest to Kann
If you want a really fast alternate encoding, you could encode all of
Unicode in at most 3 bytes. Use the high bit as a "continuation" bit
and the lower 7 bits as the data.
ASCII gets passed through unchanged.
For code points between U+0080 and U+3FFF, split the value into the
high 7 bits and l
An alternative that I've used is:
- Serialize every unsigned integer as a sequence of 7 bits, with the top
bit off for all but the last one.
- For signed integers, shift left by 1 bit, then invert if the original
was negative, then serialize as unsigned.
- Serialize a string as an i
SCSU is a pass-through for ASCII, plus it handles the common mix of
ASCII plus 96 local characters (Latin-1, Greek, Cyrillic, Thai, etc)
really fast. Go look at the sample code. If you take that as starting
point for optimization, I think you'll be fine.
Thanks to everyone for the detailed responses. I definitely
appreciate the feedback on the broader issue (even though my question
was very narrow).
I should clarify my use case a little. I'm creating a generic data
serialization format similar to Google Protocol Buffers and Apache
Thrift. Other
On 6/2/2010 3:28 PM, John Dlugosz wrote:
If anyone can “null and void” it, I wonder why companies bother to put
such things in people’s outgoing mail. I would have thought they could
come up with a proper net-etiquite version, but they just don’t care.
These things are bogus, because they ge
> I'm not sure how much longer we should continue to wait for Tengwar and
> Cirth.
Three words: Squeaky wheel -- grease.
Don't expect this to "just happen". The corporate members of
the Unicode Consortium are mostly concerned about economically
significant sets of characters that impact their b
If anyone can "null and void" it, I wonder why companies bother to put such
things in people's outgoing mail. I would have thought they could come up with
a proper net-etiquite version, but they just don't care.
From: Asmus Freytag [mailto:asm...@ix.netcom.com]
Sent: Wednesday, June 02, 2010 3:
From: Kenneth Whistler (k...@sybase.com)
[snip]
> I expect that even this explanation will not satisfy those
> who think that oddities like this should not exist in
> character names. But that is just the nature of the
> historical development of big standards like the Unicode
> Standard when y
> > Note that as of 1993, the only "LAMDA" or "LAMBDA" characters
> > in the standard were:
> >
> > 039B;GREEK CAPITAL LETTER LAMDA;Lu;0;L;N;GREEK CAPITAL LETTER
> > LAMBDA;;;03BB;
> > 03BB;GREEK SMALL LETTER LAMDA;Ll;0;L;N;GREEK SMALL LETTER
> > LAMBDA;;039B;;039B
> > 019B;LATIN SMALL LE
On 6/2/2010 11:46 AM, Jonathan Rosenne wrote:
Although this mail was not addressed to me, I did read it. Sue me.
The terms of use for the Unicode mail list essentially state that these
types of boilerplate are null and void as far as Unicode is concerned.
You will find the following in
h
Although this mail was not addressed to me, I did read it. Sue me.
Jony
> -Original Message-
> From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On
> Behalf Of John Dlugosz
> Sent: Wednesday, June 02, 2010 5:03 PM
> Cc: unicode@unicode.org
> Subject: RE: Greek letter "L
Dear list members,
This is your official notification that this thread is now terminated.
The discussions of 3rd party font IP and trademark status are out of scope
and unlikely to result in enlightening discussion here.
Regards,
-- Sarasvati
On 6/2/2010 10:00 AM, Erkki I. Kolehmainen wrote:
Sarasvati?
I'd personally wish to see you act...
Regards, Erkki
Erkki I. Kolehmainen
Tilkankatu 12 A 3, FI-00300 Helsinki, Finland
Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943
-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicod
> Anyway, most existing supporters of Tengwar and Cirth (also
> Klingonists) still use some transliteration
Transliterateability shouldn't be a factor, many of the scripts in Unicode have
been transliterated (like Latin). Perhaps if it was only transliterated and
the script was never used (but
On Jun 2, 2010, at 3:49 AM, Vinodh Rajan wrote:
> If there are similar projects that encode Ancient Characters in PUA, may be
> you can co-ordinate with them. Similar to the ConScript Unicode Registry.
>
There is a proposal for "Old Hanzi" being worked on by the IRG. You can peruse
the IRGs
This is a bad idea.
The best way to make it go away is to just stop discussing it.
Peter
-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
Of William_J_G Overington
Sent: Wednesday, June 02, 2010 2:51 AM
To: Unicode Discussion; John H. J
> The trademarked name does not use ALL CAPS.
Is Unicode a registered trademark then? If yes where does it say so?
> Both refer to the same organization.
> Usually, you would use "The Unicode Consortium".
Are you suggesting Incorporate is equal to Consortium in this case?
"The" usage is grammat
On Jun 2, 2010, at 3:51 AM, William_J_G Overington wrote:
>
>> Unicode and ISO/IEC 10646 are attempts to solve a basic,
>> simply-described problem: provide for a standardized
>> computer representation of plain text written using existing
>> writing systems.
>
> Well, that might well be the c
On Jun 2, 2010, at 3:51 AM, William_J_G Overington wrote:
> I know of no reason to think that a person "skilled in the art" would be
> unable to write an iPad app to receive a program written in the portable
> interpretable object code arriving within a Unicode text message and then for
> the
vanis...@boil.afraid.org wrote:
> From: Doug Ewell (d...@ewellic.org)
> > I'm not sure how much longer we should continue to wait for Tengwar and
> > Cirth.
>
> I hear Michael talking about meeting with the Tokeinists every once in a
> while, so I can only assume that it is proceeding in some way.
From: Doug Ewell (d...@ewellic.org)
Van Anderson wrote:
> > Look up the Conscript Unicode Registry if you want to examine a
> > pseudo-standardized Private Use agreement. A simple mapping table will
> > enable you to equate your private use "standard" to the officially
> > encoded forms of the
> Perhaps a better approach would be to establish a "Frequently Asked
> Questions" list on the Unicode Web site. Oh, wait.
>
> --
> Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
> RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
>
FWIW, I checked the FAQ fi
> Robert Abel noted:
>
> Note that as of 1993, the only "LAMDA" or "LAMBDA" characters
> in the standard were:
>
> 039B;GREEK CAPITAL LETTER LAMDA;Lu;0;L;N;GREEK CAPITAL LETTER
> LAMBDA;;;03BB;
> 03BB;GREEK SMALL LETTER LAMDA;Ll;0;L;N;GREEK SMALL LETTER
> LAMBDA;;039B;;039B
> 019B;LATIN S
> Resending (from Gmail), because the Unicode list rejected the SMTP server of
> my mail provider (Spamcop is defective).
Nothing forbifs you to create new serializations of Unicode; you may
even create it so that it will be a conforming
process (meaning that it will preserve *all* valid Unicode
On Tue, Jun 1, 2010 at 11:04 PM, Kannan Goundan wrote:
>
> I'm trying to come up with a compact encoding for Unicode strings for
> data serialization purposes. The goals are fast read/write and small
> size.
>
> The plan:
> 1. BMP code points are encoded as two bytes (0x-0x, minus surroga
Kannan Goundan wrote:
Hmm... I had skimmed the SCSU document a few days ago. At the time it
seemed a bit more complicated than I wanted.
SCSU decoders are not complicated, and with encoders, you get to make
the decision between simplicity and high performance.
The reputation of SCSU for b
Van Anderson wrote:
Look up the Conscript Unicode Registry if you want to examine a
pseudo-standardized Private Use agreement. A simple mapping table will
enable you to equate your private use "standard" to the officially
encoded forms of these scripts, when that time comes, if you wish to
p
Van Anderson wrote:
Emoticons (as emoji) are exchanged as plain text. The only
consideration that changed was whether they should be considered as
markup or not. Eventually, it became clear that they no longer do
classify as markup, but as plain text. This was not a change inpolicy,
it was a
From: William_J_G Overington (wjgo_10...@btinternet.com)
> On Tuesday 1 June 2010, John H. Jenkins wrote:
>
> > First of all, as Michael says, this
> > isn't character encoding.
>
> Well, it is a collection of portable interpretable object code items encoded
> within a character encoding
I cannot but agree.
Sincerely,
Erkki I. Kolehmainen
Tilkankatu 12 A 3, FI-00300 Helsinki, Finland
Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943
-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On
Behalf Of Michael Evers
On 2 Jun 2010, at 10:51, William_J_G Overington wrote:
> Well, that might well be the case historically, yet then the emoji were
> invented and they were encoded. The emoji existed at the time that they were
> encoded, yet they did not exist at the time that the standards were started.
The char
On 2 June 2010 10:51, William_J_G Overington wrote:
>
> I know of no reason to think that a person "skilled in the art" would be
> unable to write an iPad app to receive a program written in the portable
> interpretable object code arriving within a Unicode text message and then for
> the progr
From: jander...@talentex.co.uk
> I am brewing on some plans for making a font with glyphs for ancient
> Chinese characters and even for some of the more "dubious" glyphs; I
> assume that there is no standard area in the Unicode standard for
> these; so where can I put them so they are least like
Thank you for replying.
On Tuesday 1 June 2010, John H. Jenkins wrote:
> First of all, as Michael says, this
> isn't character encoding.
Well, it is a collection of portable interpretable object code items encoded
within a character encoding as if the items were characters.
> You're not
Put them in PUA - Private Use Area.
http://unicode.org/charts/PDF/UE000.pdf
If there are similar projects that encode Ancient Characters in PUA, may be
you can co-ordinate with them. Similar to the ConScript Unicode Registry.
V
On Wed, Jun 2, 2010 at 2:30 PM, jander...@talentex.co.uk <
jander..
I am brewing on some plans for making a font with glyphs for ancient
Chinese characters and even for some of the more "dubious" glyphs; I
assume that there is no standard area in the Unicode standard for
these; so where can I put them so they are least likely to clash with
others?
On 2 Jun 2010, at 00:14, Mark Crispin wrote:
> Is it really necessary to have this sort of pedagogical discussions on the
> Unicode list?
Even I'm not so curmudgeonly, Mark. Live with it and use the delete key.
Cheerily,
Michael Everson * http://www.evertype.com/
On 1 Jun 2010, at 22:50, Kenneth Whistler wrote:
> Note that:
>
> 1038D;UGARITIC LETTER LAMDA;Lo;0;L;N;
>
> is a distinct issue, and would no doubt would still have been spelled
> "LAMDA", even if all the Greek characters in the standard had been spelled
> "LAMBDA".
*waves hand and ta
On Tue, Jun 1, 2010 at 23:30, Asmus Freytag wrote:
> Why not use SCSU?
>
> You get the small size and the encoder/decoder aren't that
> complicated.
Hmm... I had skimmed the SCSU document a few days ago. At the time it
seemed a bit more complicated than I wanted. What's nice about UTF-8
and UTF
42 matches
Mail list logo