Non-Vedic Sarasvati

2010-06-02 Thread Tulasi
Probably closing the thread with answers to the questions would have been better approach, instead of "terminating" so. Fyi, Sarasvati is Vedic goddess of knowledge & ingenuity, and the goddess made such knowledge & ingenuity available for others but never implemented on Her own. pracoditaa yena

Re: Least used parts of BMP.

2010-06-02 Thread Kannan Goundan
On Wed, Jun 2, 2010 at 21:43, Doug Ewell wrote: >> If you want a really fast alternate encoding, you could encode all of >> Unicode in at most 3 bytes.  Use the high bit as a "continuation" bit and >> the lower 7 bits as the data. >> >> ASCII gets passed through unchanged. > > This is essentially

Re: Least used parts of BMP.

2010-06-02 Thread Doug Ewell
Michael D'Errico wrote: If you want a really fast alternate encoding, you could encode all of Unicode in at most 3 bytes. Use the high bit as a "continuation" bit and the lower 7 bits as the data. ASCII gets passed through unchanged. This is essentially what I was going to suggest to Kann

Re: Least used parts of BMP.

2010-06-02 Thread Michael D'Errico
If you want a really fast alternate encoding, you could encode all of Unicode in at most 3 bytes. Use the high bit as a "continuation" bit and the lower 7 bits as the data. ASCII gets passed through unchanged. For code points between U+0080 and U+3FFF, split the value into the high 7 bits and l

Re: Least used parts of BMP.

2010-06-02 Thread Mark Davis ☕
An alternative that I've used is: - Serialize every unsigned integer as a sequence of 7 bits, with the top bit off for all but the last one. - For signed integers, shift left by 1 bit, then invert if the original was negative, then serialize as unsigned. - Serialize a string as an i

Re: Least used parts of BMP.

2010-06-02 Thread Asmus Freytag
SCSU is a pass-through for ASCII, plus it handles the common mix of ASCII plus 96 local characters (Latin-1, Greek, Cyrillic, Thai, etc) really fast. Go look at the sample code. If you take that as starting point for optimization, I think you'll be fine.

Re: Least used parts of BMP.

2010-06-02 Thread Kannan Goundan
Thanks to everyone for the detailed responses. I definitely appreciate the feedback on the broader issue (even though my question was very narrow). I should clarify my use case a little. I'm creating a generic data serialization format similar to Google Protocol Buffers and Apache Thrift. Other

Re: Greek letter "LAMDA"?

2010-06-02 Thread Asmus Freytag
On 6/2/2010 3:28 PM, John Dlugosz wrote: If anyone can “null and void” it, I wonder why companies bother to put such things in people’s outgoing mail. I would have thought they could come up with a proper net-etiquite version, but they just don’t care. These things are bogus, because they ge

Tengwar and Cirth (was: Re: A question about "user areas")

2010-06-02 Thread Kenneth Whistler
> I'm not sure how much longer we should continue to wait for Tengwar and > Cirth. Three words: Squeaky wheel -- grease. Don't expect this to "just happen". The corporate members of the Unicode Consortium are mostly concerned about economically significant sets of characters that impact their b

RE: Greek letter "LAMDA"?

2010-06-02 Thread John Dlugosz
If anyone can "null and void" it, I wonder why companies bother to put such things in people's outgoing mail. I would have thought they could come up with a proper net-etiquite version, but they just don't care. From: Asmus Freytag [mailto:asm...@ix.netcom.com] Sent: Wednesday, June 02, 2010 3:

RE: Greek letter "LAMDA"?

2010-06-02 Thread vanisaac
From: Kenneth Whistler (k...@sybase.com) [snip] > I expect that even this explanation will not satisfy those > who think that oddities like this should not exist in > character names. But that is just the nature of the > historical development of big standards like the Unicode > Standard when y

RE: Greek letter "LAMDA"?

2010-06-02 Thread Kenneth Whistler
> > Note that as of 1993, the only "LAMDA" or "LAMBDA" characters > > in the standard were: > > > > 039B;GREEK CAPITAL LETTER LAMDA;Lu;0;L;N;GREEK CAPITAL LETTER > > LAMBDA;;;03BB; > > 03BB;GREEK SMALL LETTER LAMDA;Ll;0;L;N;GREEK SMALL LETTER > > LAMBDA;;039B;;039B > > 019B;LATIN SMALL LE

Re: Greek letter "LAMDA"?

2010-06-02 Thread Asmus Freytag
On 6/2/2010 11:46 AM, Jonathan Rosenne wrote: Although this mail was not addressed to me, I did read it. Sue me. The terms of use for the Unicode mail list essentially state that these types of boilerplate are null and void as far as Unicode is concerned. You will find the following in h

RE: Greek letter "LAMDA"?

2010-06-02 Thread Jonathan Rosenne
Although this mail was not addressed to me, I did read it. Sue me. Jony > -Original Message- > From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On > Behalf Of John Dlugosz > Sent: Wednesday, June 02, 2010 5:03 PM > Cc: unicode@unicode.org > Subject: RE: Greek letter "L

Re: IS UNICODE a STANDRAD ?

2010-06-02 Thread Sarasvati
Dear list members, This is your official notification that this thread is now terminated. The discussions of 3rd party font IP and trademark status are out of scope and unlikely to result in enlightening discussion here. Regards, -- Sarasvati On 6/2/2010 10:00 AM, Erkki I. Kolehmainen wrote:

RE: IS UNICODE a STANDRAD ?

2010-06-02 Thread Erkki I. Kolehmainen
Sarasvati? I'd personally wish to see you act... Regards, Erkki Erkki I. Kolehmainen Tilkankatu 12 A 3, FI-00300 Helsinki, Finland Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943 -Original Message- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicod

RE: A question about "user areas"

2010-06-02 Thread Shawn Steele
> Anyway, most existing supporters of Tengwar and Cirth (also > Klingonists) still use some transliteration Transliterateability shouldn't be a factor, many of the scripts in Unicode have been transliterated (like Latin). Perhaps if it was only transliterated and the script was never used (but

Re: A question about "user areas"

2010-06-02 Thread John H. Jenkins
On Jun 2, 2010, at 3:49 AM, Vinodh Rajan wrote: > If there are similar projects that encode Ancient Characters in PUA, may be > you can co-ordinate with them. Similar to the ConScript Unicode Registry. > There is a proposal for "Old Hanzi" being worked on by the IRG. You can peruse the IRGs

RE: Preparing a proposal for encoding a portable interpretable object code into Unicode (from Re: IUC 34 - call for participation open until May 26)

2010-06-02 Thread Peter Constable
This is a bad idea. The best way to make it go away is to just stop discussing it. Peter -Original Message- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of William_J_G Overington Sent: Wednesday, June 02, 2010 2:51 AM To: Unicode Discussion; John H. J

Re: IS UNICODE a STANDRAD ?

2010-06-02 Thread Tulasi
> The trademarked name does not use ALL CAPS. Is Unicode a registered trademark then? If yes where does it say so? > Both refer to the same organization. > Usually, you would use "The Unicode Consortium". Are you suggesting Incorporate is equal to Consortium in this case? "The" usage is grammat

Re: Preparing a proposal for encoding a portable interpretable object code into Unicode (from Re: IUC 34 - call for participation open until May 26)

2010-06-02 Thread John H. Jenkins
On Jun 2, 2010, at 3:51 AM, William_J_G Overington wrote: > >> Unicode and ISO/IEC 10646 are attempts to solve a basic, >> simply-described problem: provide for a standardized >> computer representation of plain text written using existing >> writing systems. > > Well, that might well be the c

Re: Preparing a proposal for encoding a portable interpretable object code into Unicode (from Re: IUC 34 - call for participation open until May 26)

2010-06-02 Thread John H. Jenkins
On Jun 2, 2010, at 3:51 AM, William_J_G Overington wrote: > I know of no reason to think that a person "skilled in the art" would be > unable to write an iPad app to receive a program written in the portable > interpretable object code arriving within a Unicode text message and then for > the

Re: A question about "user areas"

2010-06-02 Thread Philippe Verdy
vanis...@boil.afraid.org wrote: > From: Doug Ewell (d...@ewellic.org) > > I'm not sure how much longer we should continue to wait for Tengwar and > > Cirth. > > I hear Michael talking about meeting with the Tokeinists every once in a > while, so I can only assume that it is proceeding in some way.

Re: A question about "user areas"

2010-06-02 Thread vanisaac
From: Doug Ewell (d...@ewellic.org) Van Anderson wrote: > > Look up the Conscript Unicode Registry if you want to examine a > > pseudo-standardized Private Use agreement. A simple mapping table will > > enable you to equate your private use "standard" to the officially > > encoded forms of the

RE: Greek letter "LAMDA"?

2010-06-02 Thread John Dlugosz
> Perhaps a better approach would be to establish a "Frequently Asked > Questions" list on the Unicode Web site. Oh, wait. > > -- > Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org > RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s ­ > FWIW, I checked the FAQ fi

RE: Greek letter "LAMDA"?

2010-06-02 Thread John Dlugosz
> Robert Abel noted: > > Note that as of 1993, the only "LAMDA" or "LAMBDA" characters > in the standard were: > > 039B;GREEK CAPITAL LETTER LAMDA;Lu;0;L;N;GREEK CAPITAL LETTER > LAMBDA;;;03BB; > 03BB;GREEK SMALL LETTER LAMDA;Ll;0;L;N;GREEK SMALL LETTER > LAMBDA;;039B;;039B > 019B;LATIN S

re: Least used parts of BMP.

2010-06-02 Thread Philippe Verdy
> Resending (from Gmail), because the Unicode list rejected the SMTP server of > my mail provider (Spamcop is defective). Nothing forbifs you to create new serializations of Unicode; you may even create it so that it will be a conforming process (meaning that it will preserve *all* valid Unicode

Re: Least used parts of BMP.

2010-06-02 Thread David Starner
On Tue, Jun 1, 2010 at 11:04 PM, Kannan Goundan wrote: > > I'm trying to come up with a compact encoding for Unicode strings for > data serialization purposes.  The goals are fast read/write and small > size. > > The plan: > 1. BMP code points are encoded as two bytes (0x-0x, minus surroga

Re: Least used parts of BMP.

2010-06-02 Thread Doug Ewell
Kannan Goundan wrote: Hmm... I had skimmed the SCSU document a few days ago. At the time it seemed a bit more complicated than I wanted. SCSU decoders are not complicated, and with encoders, you get to make the decision between simplicity and high performance. The reputation of SCSU for b

Re: A question about "user areas"

2010-06-02 Thread Doug Ewell
Van Anderson wrote: Look up the Conscript Unicode Registry if you want to examine a pseudo-standardized Private Use agreement. A simple mapping table will enable you to equate your private use "standard" to the officially encoded forms of these scripts, when that time comes, if you wish to p

Emoji (was: Re: Preparing a proposal for encoding a portable interpretable object code into Unicode)

2010-06-02 Thread Doug Ewell
Van Anderson wrote: Emoticons (as emoji) are exchanged as plain text. The only consideration that changed was whether they should be considered as markup or not. Eventually, it became clear that they no longer do classify as markup, but as plain text. This was not a change inpolicy, it was a

Re: Preparing a proposal for encoding a portable interpretable object code into Unicode (from Re: IUC 34 - call for participation open until May 26)

2010-06-02 Thread vanisaac
From: William_J_G Overington (wjgo_10...@btinternet.com) > On Tuesday 1 June 2010, John H. Jenkins wrote: > > > First of all, as Michael says, this > > isn't character encoding. > > Well, it is a collection of portable interpretable object code items encoded > within a character encoding

RE: Preparing a proposal for encoding a portable interpretable object code into Unicode (from Re: IUC 34 - call for participation open until May 26)

2010-06-02 Thread Erkki I. Kolehmainen
I cannot but agree. Sincerely, Erkki I. Kolehmainen Tilkankatu 12 A 3, FI-00300 Helsinki, Finland Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943 -Original Message- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Michael Evers

Re: Preparing a proposal for encoding a portable interpretable object code into Unicode (from Re: IUC 34 - call for participation open until May 26)

2010-06-02 Thread Michael Everson
On 2 Jun 2010, at 10:51, William_J_G Overington wrote: > Well, that might well be the case historically, yet then the emoji were > invented and they were encoded. The emoji existed at the time that they were > encoded, yet they did not exist at the time that the standards were started. The char

Re: Preparing a proposal for encoding a portable interpretable object code into Unicode (from Re: IUC 34 - call for participation open until May 26)

2010-06-02 Thread Andrew West
On 2 June 2010 10:51, William_J_G Overington wrote: > > I know of no reason to think that a person "skilled in the art" would be > unable to write an iPad app to receive a program written in the portable > interpretable object code arriving within a Unicode text message and then for > the progr

Re: A question about "user areas"

2010-06-02 Thread vanisaac
From: jander...@talentex.co.uk > I am brewing on some plans for making a font with glyphs for ancient > Chinese characters and even for some of the more "dubious" glyphs; I > assume that there is no standard area in the Unicode standard for > these; so where can I put them so they are least like

Re: Preparing a proposal for encoding a portable interpretable object code into Unicode (from Re: IUC 34 - call for participation open until May 26)

2010-06-02 Thread William_J_G Overington
Thank you for replying. On Tuesday 1 June 2010, John H. Jenkins wrote: > First of all, as Michael says, this > isn't character encoding. Well, it is a collection of portable interpretable object code items encoded within a character encoding as if the items were characters. > You're not

Re: A question about "user areas"

2010-06-02 Thread Vinodh Rajan
Put them in PUA - Private Use Area. http://unicode.org/charts/PDF/UE000.pdf If there are similar projects that encode Ancient Characters in PUA, may be you can co-ordinate with them. Similar to the ConScript Unicode Registry. V On Wed, Jun 2, 2010 at 2:30 PM, jander...@talentex.co.uk < jander..

A question about "user areas"

2010-06-02 Thread jander...@talentex.co.uk
I am brewing on some plans for making a font with glyphs for ancient Chinese characters and even for some of the more "dubious" glyphs; I assume that there is no standard area in the Unicode standard for these; so where can I put them so they are least likely to clash with others?

Re: Greek letter "LAMDA"?

2010-06-02 Thread Michael Everson
On 2 Jun 2010, at 00:14, Mark Crispin wrote: > Is it really necessary to have this sort of pedagogical discussions on the > Unicode list? Even I'm not so curmudgeonly, Mark. Live with it and use the delete key. Cheerily, Michael Everson * http://www.evertype.com/

Re: Greek letter "LAMDA"?

2010-06-02 Thread Michael Everson
On 1 Jun 2010, at 22:50, Kenneth Whistler wrote: > Note that: > > 1038D;UGARITIC LETTER LAMDA;Lo;0;L;N; > > is a distinct issue, and would no doubt would still have been spelled > "LAMDA", even if all the Greek characters in the standard had been spelled > "LAMBDA". *waves hand and ta

Re: Least used parts of BMP.

2010-06-02 Thread Kannan Goundan
On Tue, Jun 1, 2010 at 23:30, Asmus Freytag wrote: > Why not use SCSU? > > You get the small size and the encoder/decoder aren't that > complicated. Hmm... I had skimmed the SCSU document a few days ago. At the time it seemed a bit more complicated than I wanted. What's nice about UTF-8 and UTF