Re: UTF-8 to EBCDIC

2002-07-31 Thread Doug Ewell
Two steps are necessary here: 1. Decode UTF-8 to Unicode scalar values. 2. Look up the Unicode scalar values in the table referenced by Magda, and find the corresponding CP037 code point. We can help with either of these steps. Contact us, either on the list or privately, if you need assistan

Re: Tamil Text Messaging in Mobile Phones

2002-07-31 Thread James Kass
Marco Cimarosti wrote, > > > > ** The stroke in Phaistos symbols in ConScript PUA encoding is > > the closest I could find. > > :-))) > > C'mon, be serious! That can be mapped to U+0316 (COMBINING GRAVE ACCENT > BELOW). > Seriously, you're right! Best regards, James Kass.

Re: library for identifying equivalent sequences

2002-07-31 Thread Mark Davis
We do have that in ICU 2.2. It is not a public interface (meaning that we will likely change the API before we make it public), but it is accessible if you want to test with it for now. It is part of what we use to optimize our internal processing by producing the canonical closure of a dataset.

Re: "Missing character" glyph

2002-07-31 Thread Doug Ewell
Asmus Freytag wrote: > No code point is safe. Indeed, but some are less unsafe than others. You can't use U+FFEF, because some process might actually filter out noncharacters. You can't use U+FFFD, because some process might generate a special glyph for it (SC UniPad does). And the moment yo

Re: Teletext

2002-07-31 Thread Shlomi Tal
>From: Lars Marius Garshol <[EMAIL PROTECTED]> >This reminds me: does anyone have any pointers to information on how >to convert visually encoded text (especially HTML, but also other >formats) to Unicode? There are programs that do it on the fly for Hebrew. The best, which I have used myself,

Re: UTF-8 to EBCDIC

2002-07-31 Thread Asmus Freytag
See the technical report on UTF-EBCDIC. Perhaps, that's what's needed? A./ http://www.unicode.org/reports/tr16 At 05:06 PM 7/31/02 -0700, Magda Danish (Unicode) wrote: > > -Original Message- > > From: Vishweshwaraiah, Balasubramanya > > [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, Ju

Re: [OpenType] library for identifying equivalent sequences

2002-07-31 Thread Peter_Constable
On 07/31/2002 05:46:02 PM Eric Muller wrote: >Eg. don't you also want the strings that contain a sprinkling of ZWJ, >ZWNJ, CGJ, SHY and various other things? (Yuck.) Why, of course. (Bleecchh.) But it's easier to write an algorithm to insert those than to derive the other. (Gag, choke.) So, I w

Re: "Missing character" glyph

2002-07-31 Thread Kenneth Whistler
Asmus wrote: > At 08:40 PM 7/30/02 -0700, Doug Ewell wrote: > >a code-point that has no > > > character assigned to it (and is not likely to get one), e. g. U+03A2 > > No code point is safe. True enough. But then I figure Plane 13 characters like U+DEAD1 are pretty unlikely to be assigned to a

UTF-8 to EBCDIC

2002-07-31 Thread Magda Danish (Unicode)
> -Original Message- > From: Vishweshwaraiah, Balasubramanya > [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 30, 2002 2:52 PM > To: Magda Danish (Unicode) > Subject: RE: Web Form: General question > > > Magda Danish, > Thanks a lot for your interest in helping me by giving suggest

Re: "Missing character" glyph

2002-07-31 Thread Asmus Freytag
At 08:40 PM 7/30/02 -0700, Doug Ewell wrote: >a code-point that has no > > character assigned to it (and is not likely to get one), e. g. U+03A2 No code point is safe. A./

Re: Teletext

2002-07-31 Thread Kenneth Whistler
William Overington suggested: > I am thinking that it would be a good idea to encode the archive copies of > teletext pages that exist into a Unicode compatible format for the future. > Teletext has been around for about a quarter of a century in more or less > its present form and within another

Re: [OpenType] library for identifying equivalent sequences

2002-07-31 Thread Eric Muller
I don't have what you are looking for [canonically equivalent strings], but I am curious how you plan to go from that to: >(The underlying issue is that I'm trying to figure out, given some >precomposed glyph in a font, what are all the valid substitutions that >could be applied in the smart-fon

library for identifying equivalent sequences

2002-07-31 Thread Peter_Constable
I'm wondering if anyone is aware of any software libararies available that can be used to solve a particular problem: for a given character sequence, I need to enumerate all of the canonically equivalent character sequences. Put another equivalent way, given a character sequence in NFD, I need to

RE: Subscript & Superscript

2002-07-31 Thread Peter_Constable
On 07/31/2002 12:27:46 PM Michael Everson wrote: >>Unicode is not for encoding typographical effects such as superscripts or >>subscripts (the sups and subs in area U+2070..U+208E are part of a sort of >>"archaeological area" of Unicode, which is called Compatibility Characters). > >Not quite. S

Re: Teletext

2002-07-31 Thread Lars Marius Garshol
* Shlomi Tal | | 2. Teletext offers no bidirectional algorithm. The display mechanism | is limited to monodirectional LTR, necessitating the use of visually | encoded Hebrew (that is, monodirectional LTR written Hebrew; see | also my Hebrew FAQ for a longer explanation). This needs to be | inver

Re: Teletext

2002-07-31 Thread Shlomi Tal
Teletext uses VERY old technology encoding in general. I don't know if it's true for other languages, but Hebrew teletext encodes the Hebrew letters using the 7-bit SI-960, which maps the Hebrew letters instead of the lowercase Latin letters (positions 0x60 to 0x7A). In Hebrew teletext you get

RE: Subscript & Superscript

2002-07-31 Thread Michael Everson
At 19:05 +0200 2002-07-31, Marco Cimarosti wrote: >Unicode is not for encoding typographical effects such as superscripts or >subscripts (the sups and subs in area U+2070..U+208E are part of a sort of >"archaeological area" of Unicode, which is called Compatibility Characters). Not quite. Some s

RE: Subscript & Superscript

2002-07-31 Thread Marco Cimarosti
Magda Danish wrote: > > -Original Message- > > Date/Time:Tue Jul 30 12:26:40 EDT 2002 > > Contact: [EMAIL PROTECTED] > > Report Type: FAQ Suggestion > > > > We need to know how to express a Subscript letter in Unicode. > > On your site, we've found in 2070-208E how to express a

RE: Tamil Text Messaging in Mobile Phones

2002-07-31 Thread Marco Cimarosti
James Kass wrote: > Is this a graphic showing the experimental diacritics you mention? > http://www.geocities.com/avarangal/imagay1.gif > > If so, it should be possible for most of these to be encoded in text > as pronunciation indicators using existing Unicode characters. > > Glyph - Unicode

Re: "Missing character" glyph

2002-07-31 Thread John H. Jenkins
On Tuesday, July 30, 2002, at 08:58 PM, Doug Ewell wrote: > Have Last Resort symbols been devised for all the blocks in Unicode, > including the new ones like Tagalog? Neither Mark Leisher's page nor > the Apple typography page contains a complete list. > > Yes. It covers all of Unicode 3.2;

Re: Subscript & Superscript

2002-07-31 Thread Peter_Constable
On 07/31/2002 03:48:07 AM "William Overington" wrote: >I know little about XML so I do not know whether this suggestion will be a >suitable solution for the requirement of the person who wrote to the Unicode >Consortium. Not at all, I'm afraid. The person who wrote: >>> -Original Message--

Teletext

2002-07-31 Thread William Overington
In the United Kingdom there is a widely used information system known as teletext. It is also used in many other countries. Teletext is a digital technology used in conjunction with analogue television systems. Digital information is inserted in several of the otherwise unused lines of the tele

Re: Subscript & Superscript

2002-07-31 Thread William Overington
Some time ago in this list, Mr Bernard Miller posted a note about his Bytext system. If one goes to http://www.bytext.org and then goes through to the documentation page at http://www.bytext.org/documentation.htm one may download a copy of the latest edition of The Bytext Standard. I chose to do

Re: "Missing character" glyph

2002-07-31 Thread Michael Everson
At 17:15 -0400 2002-07-30, Tom Gewecke wrote: > >Apple's Last Resort font. :-) > >Which I believe uses the various symbols shown at > >http://www.unicode.org/charts/ > >so you can easily tell from which code range your font is missing the >character. I think those glyphs are from the older vers

Re: quotation marks in European languages

2002-07-31 Thread Otto Stolz
Scripsissem: > The correct quote symbols, according to the German typographic > tradition, are ... John Cowan scripsit: > Does not German also support the quotation dash for dialogue? Not really. You may use the dash to indicate change of the speaker, cf.

Re: Tamil Text Messaging in Mobile Phones

2002-07-31 Thread James Kass
Dear Sinnathurai Srivas, Is this a graphic showing the experimental diacritics you mention? http://www.geocities.com/avarangal/imagay1.gif If so, it should be possible for most of these to be encoded in text as pronunciation indicators using existing Unicode characters. Glyph - Unicode No.