Re: [9fans] Woes of New Language Support

2009-07-28 Thread Ethan Grammatikidis
On Tue, 28 Jul 2009 07:52:14 -0700 John Floren wrote: > On Tue, Jul 28, 2009 at 7:11 AM, Ethan Grammatikidis > wrote: > > On Tue, 28 Jul 2009 11:39:46 +0100 > > Charles Forsyth wrote: > > > >> >the unicode proposal says that matches depend on (re, locale, input). > >> >not just (re, input).  i

Re: [9fans] Woes of New Language Support

2009-07-28 Thread John Floren
On Tue, Jul 28, 2009 at 7:11 AM, Ethan Grammatikidis wrote: > On Tue, 28 Jul 2009 11:39:46 +0100 > Charles Forsyth wrote: > >> >the unicode proposal says that matches depend on (re, locale, input). >> >not just (re, input).  i would think that is not acceptable. >> >> it's not just the unicode peo

Re: [9fans] Woes of New Language Support

2009-07-28 Thread Ethan Grammatikidis
On Tue, 28 Jul 2009 11:39:46 +0100 Charles Forsyth wrote: > >the unicode proposal says that matches depend on (re, locale, input). > >not just (re, input). i would think that is not acceptable. > > it's not just the unicode people. shell file name matching takes locale into > account > which o

Re: [9fans] Woes of New Language Support

2009-07-28 Thread Charles Forsyth
>the unicode proposal says that matches depend on (re, locale, input). >not just (re, input). i would think that is not acceptable. it's not just the unicode people. shell file name matching takes locale into account which often makes it case-independent (even with case-dependent file systems).

Re: [9fans] Woes of New Language Support

2009-07-26 Thread erik quanstrom
On Sun Jul 26 14:40:56 EDT 2009, knapj...@gmail.com wrote: > If I'm reading you right, you're saying it might be easier if > everything were encoded as combining (or maybe more aptly > non-combining) codes, regardless of language? > > So, we might encode 'Waffles' as w+upper a f f l e s and let th

Re: [9fans] Woes of New Language Support

2009-07-26 Thread Jack Johnson
If I'm reading you right, you're saying it might be easier if everything were encoded as combining (or maybe more aptly non-combining) codes, regardless of language? So, we might encode 'Waffles' as w+upper a f f l e s and let the renderer (if there is one) handle the presentation of the case shif

Re: [9fans] Woes of New Language Support

2009-07-26 Thread Nathaniel W Filardo
On Sun, Jul 26, 2009 at 09:48:23AM -0400, erik quanstrom wrote: > > to be fair to the unicode people, this decoupling of glyphs and codepoints > > is (i think) the most straightforward way to implement some languages like > > arabic, where the glyphs for characters depend on their position within a

Re: [9fans] Woes of New Language Support

2009-07-26 Thread erik quanstrom
> the real problem isn't in viewing them however, but comes when you > start searching for them: it's easy to search for ë (e-umlaut) for > example, but what if it's described as e+"U+0308 COMBINING DIAERESIS"? > the answer is the UTS#18 Regular Expressions technical standard which > probably contr

Re: [9fans] Woes of New Language Support

2009-07-26 Thread erik quanstrom
On Sun Jul 26 10:14:51 EDT 2009, tlaro...@polynum.com wrote: > On Sun, Jul 26, 2009 at 09:48:23AM -0400, erik quanstrom wrote: > > > > my opinion (not that i'm entitled to one here) is > > that the unicode guys screwed up. unicode is not > > consistant. explain why there are two code points sigm

Re: [9fans] Woes of New Language Support

2009-07-26 Thread tlaronde
On Sun, Jul 26, 2009 at 09:48:23AM -0400, erik quanstrom wrote: > > my opinion (not that i'm entitled to one here) is > that the unicode guys screwed up. unicode is not > consistant. explain why there are two code points sigma. > 03c3 greek small letter sigma > 03c2 greek small letter final s

Re: [9fans] Woes of New Language Support

2009-07-26 Thread erik quanstrom
> to be fair to the unicode people, this decoupling of glyphs and codepoints > is (i think) the most straightforward way to implement some languages like > arabic, where the glyphs for characters depend on their position within a > word. that is, a letter at the beginning of a word looks different

Re: [9fans] Woes of New Language Support

2009-07-26 Thread Akshat Kumar
Please disregard the question, "kbmap perhaps?" in my last post. I quickly realised that kbmap is only for inputs, while I'm discussing plain old output from every other source. partying too much ak

Re: [9fans] Woes of New Language Support

2009-07-26 Thread Akshat Kumar
> what is the total number of stealth characters like nsa? > if it'not too unreasonable, it might be good enough to steal part of > the operating system or application reserved areas. Any consonant should be able to become a half-consonant, but only when followed by another consonant. In the TTF m

Re: [9fans] Woes of New Language Support

2009-07-26 Thread Salman Aljammaz
erik quanstrom wrote: > yes. this is a problem. unfortunately the unicode guys > took the position that codepoint is divorced from glyphs > unfortunately, this case isn't as bad as it gets. e.g. archaic cryllic > letters have transliterations like ^^A in unicode. would > three hats on an A be i

Re: [9fans] Woes of New Language Support

2009-07-26 Thread andrey mirtchovski
diacritics (combining characters) are a real mess in Unicode. with so much space in the format why did they have to go this route, i wonder? erik mentioned cyrillic. i did have an old church slavonic bible text i was attempting to display correctly on Plan 9 sometime in 2003-4. top is x11 with cor

Re: [9fans] Woes of New Language Support

2009-07-25 Thread erik quanstrom
> However, in the class of languages for which I am trying to > provide support, certain characters are meant to be produced > by an ordered combination of other characters. For example, > the general sequence in Devanagari script (and this extends > to the other scripts as well) is that > consona

[9fans] Woes of New Language Support

2009-07-25 Thread akumar
I've been trying to add support for Sanskrit derived languages, but just rendering the characters has halted progress. For the currently supported languages, such as English, Russian, Greek, French, even Japanese, the characters are more or less statically mapped to the unicode (looking at my $fon