Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Peter Kirk
On 12/12/2003 09:27, Andrew C. West wrote: ... But why on earth are we talking about mapping grapheme clusters to the PUA ?! I thought we had heard the last of that sort of "heresy" when William softly and suddenly vanished away. Andrew Strictly for internal use only, because Mark made the poi

RE: [OT?] The C standard library and UTF's (was RE: Text Editors and Canonical Equivalence (was Coloured diacritics))

2003-12-12 Thread Kent Karlsson
> Tim Greenwood wrote: > > In my interpretation of the C standard (which I am reading from > > http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a > > valid wchar_t encoding if your execution character set contains > > characters outside the C0 controls and Basic Latin range, a

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Philippe Verdy
Andrew C. West writes: > But why on earth are we talking about mapping grapheme clusters > to the PUA ?! I thought we had heard the last of that sort of > "heresy" when William softly and suddenly vanished away. I do agree that this is an heresy for the encoding of texts. But not for processing

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread jon
> But why on earth are we talking about mapping grapheme clusters to the PUA ?! It's valid, just don't expect, and hence don't plan for, anyone else following suit.

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Andrew C. West
On Fri, 12 Dec 2003 07:53:13 -0800, Peter Kirk wrote: > > OK. In fact I suspect that the number "that have meaningful semantics > and effective usage" is actually rather small and could be fitted within > the higher PUA planes if one chose to do that. After all, not many > languages use large n

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Philippe Verdy
> -Message d'origine- > De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Envoyé : vendredi 12 décembre 2003 17:28 > À : [EMAIL PROTECTED] > Objet : RE: Text Editors and Canonical Equivalence (was Coloured > diacritics) > > > Quoting Philippe Verdy &

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Doug Ewell
Peter Kirk wrote: >>> And what, I find myself wondering, does "nearly infinite" mean? >> >> It means "finite". > > Except in the original context it should have meant "infinite", as > there is actually an infinite number of potential default grapheme > clusters. How can that be, if there is a fi

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Peter Kirk
On 12/12/2003 07:34, Philippe Verdy wrote: Peter Kirk wrote: On 12/12/2003 04:31, Michael Everson wrote: At 12:17 + 2003-12-12, Arcane Jill wrote: And what, I find myself wondering, does "nearly infinite" mean? It means "finite". Except in the original context it

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Philippe Verdy
Peter Kirk wrote: > On 12/12/2003 04:31, Michael Everson wrote: > > At 12:17 + 2003-12-12, Arcane Jill wrote: > >> And what, I find myself wondering, does "nearly infinite" mean? > > It means "finite". > > Except in the original context it should have meant "infinite", as there > is actually

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Peter Kirk
On 12/12/2003 04:31, Michael Everson wrote: At 12:17 + 2003-12-12, Arcane Jill wrote: And what, I find myself wondering, does "nearly infinite" mean? It means "finite". Except in the original context it should have meant "infinite", as there is actually an infinite number of potential defa

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread jon
Quoting Peter Kirk <[EMAIL PROTECTED]>: [snip me quoting D17a] > > > >"in some way defective" is actually a good way to put it methinks, they > aren't > >illegal, and in some cases you can do things with them that are both > reasonable > >and useful, but in other situations they may be problemat

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Peter Kirk
On 12/12/2003 04:13, [EMAIL PROTECTED] wrote: Thank you. I was supposing that isolated combining marks were considered in some way defective, http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf";> D17a: Defective combining character sequence: A combining character sequence that does not s

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Peter Kirk
On 12/12/2003 04:29, Philippe Verdy wrote: ... But what you suggest here is exactly what a standard file compressor does. It does not solve any problem in the representation of characters, the compression scheme remains private, and can only be interpreted as text by redecomposing these PUAs (in

[OT?] The C standard library and UTF's (was RE: Text Editors and Canonical Equivalence (was Coloured diacritics))

2003-12-12 Thread Marco Cimarosti
Tim Greenwood wrote: > In my interpretation of the C standard (which I am reading from > http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a > valid wchar_t encoding if your execution character set contains > characters outside the C0 controls and Basic Latin range, and > UTF-1

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Michael Everson
At 12:17 + 2003-12-12, Arcane Jill wrote: And what, I find myself wondering, does "nearly infinite" mean? It means "finite". -- Michael Everson * * Everson Typography * * http://www.evertype.com

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Philippe Verdy
Peter Kirk wrote: > >... Now how will you implement indexing with these > >private private PUAs which change of semantics across documents? > What is the > >relevant scope for these PUAs? > > > > > The scope would be one instance of a document opened in an application. > As for implementation d

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread jon
> Thank you. I was supposing that isolated combining marks were considered > in some way defective, http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf";> D17a: Defective combining character sequence: A combining character sequence that does not start with a base character. [Explanatory Note]

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Arcane Jill
Friday, December 12, 2003 1:55 AM To:Peter Kirk Cc:[EMAIL PROTECTED] Subject:RE: Text Editors and Canonical Equivalence (was Coloured diacritics) I did not try to count them for the simplest cases, but possible DGCs are nearly infinite:

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Peter Kirk
On 11/12/2003 17:55, Philippe Verdy wrote: Peter Kirk wrote: I am sure that some tricks could be found to simplify the indexing if necessary, e.g. using PUA or non-character code points indexed into a special table to replace DGCs which cannot be represented as a single character. (There are

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-12 Thread Peter Kirk
On 11/12/2003 17:16, Mark Davis wrote: Sure. "a" alone is a valid default grapheme cluster. Combining dieresis alone is a perfectly valid default grapheme cluster. 2 if separate, but one if concatenated (in the right order). This is similar (though not completely) to the case of words: "large fish

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Doug Ewell
Kenneth Whistler wrote: > It is perfectly conformant with the Unicode Standard to assert > that "Ã" and "Ã" are different > Unicode strings. They *are* different Unicode strings. They > contain different encoded characters, and they have different > lengths. > ... > What canonical equivalence i

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Philippe Verdy
Mark Davis wrote: > You could conceivably restrict your dream programming language to only > 'complete' default grapheme clusters, defined as those where the > addition of previous characters would never change that boundary, but > in practice I don't think your dream language would be particular

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Philippe Verdy
Peter Kirk wrote: > I am sure that some tricks could be found to > simplify the indexing if necessary, e.g. using PUA or non-character code > points indexed into a special table to replace DGCs which cannot be > represented as a single character. (There are plenty of non-characters > available

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Mark Davis
AIL PROTECTED]> Cc: "Kenneth Whistler" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thu, 2003 Dec 11 14:58 Subject: Re: Text Editors and Canonical Equivalence (was Coloured diacritics) > On 11/12/2003 10:16, Mark Davis wrote: > > >>Mark, don't

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Peter Kirk
On 11/12/2003 10:16, Mark Davis wrote: Mark, don't patronise me. I'm not talking about levels of enlightenment. I'm not talking about levels in the sense you just used when you mentioned "higher-level issues". I'm talking about the well-known concept of levels or layers of programming and of commu

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Peter Kirk
On 11/12/2003 10:02, [EMAIL PROTECTED] wrote: Beginners, even young children, can be taught simple programming and string handling without knowing anything about bits and bytes, certainly without having to know whether the e acute they just typed is stored as one byte or two. Just as people can

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Peter Kirk
On 11/12/2003 09:05, Michael (michka) Kaplan wrote: From: "Peter Kirk" <[EMAIL PROTECTED]> Here I disagree. As an application programmer writing for example some kind of linguistic application, it is totally irrelevant to me how much actual storage a string takes. Such things should be hidden

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Philippe Verdy
[EMAIL PROTECTED] wrote: > Beginners, even young children, can get the concept of characters > being mapped to numbers. Certainly those young children that will > thrive on programming will have a fascination with this process in > and off itself (it's just like the kids-in-treehuts type cryptog

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Benjamin Peterson
On Thu, 11 Dec 2003 09:05:10 -0800, "Michael (michka) Kaplan" <[EMAIL PROTECTED]> said: > I think you are mostly mistaken here. All of the programmers I know (i.e. > script kiddies need not apply? ) call APIs. The bulk of those APIs > deal with APIs that have no notion of any of this. They take L

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Mark Davis
> Mark, don't patronise me. I'm not talking about levels of enlightenment. > I'm not talking about levels in the sense you just used when you > mentioned "higher-level issues". I'm talking about the well-known > concept of levels or layers of programming and of communication protocols. My apologi

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread jon
> Beginners, even young children, can be taught simple programming and > string handling without knowing anything about bits and bytes, certainly > without having to know whether the e acute they just typed is stored as > one byte or two. Just as people can and do learn to drive cars without >

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Michael \(michka\) Kaplan
From: "Peter Kirk" <[EMAIL PROTECTED]> > Here I disagree. As an application programmer writing for example some > kind of linguistic application, it is totally irrelevant to me how much > actual storage a string takes. Such things should be hidden away from me > by several levels of system softwar

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Arcane Jill
I think Marco here has the definitive answer. I've thought about this a lot, and it seems to me that he's right. A consequence of this appears to be that it DOESN'T MATTER whether or not a text editor normalises C or C++ source code, into either NFC or NFD. It shouldn't make the slightest bit

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Peter Kirk
On 11/12/2003 07:40, Mark Davis wrote: Peter, here is your original remark. Ken has gracefully filled the gap in explaining the higher-level issues, but let's return to that for a minute. No, surely not. If the wcslen() function is fully Unicode conformant, it should give the same output whatev

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Peter Kirk
On 11/12/2003 05:43, Philippe Verdy wrote: Thanks for the clarification. We are again talking at different levels. I am still looking from the point of view of an application programmer interested in a string as an abstract entity (an object or an abstract data type) with a meaning or interpret

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Tim Greenwood
In my interpretation of the C standard (which I am reading from http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a valid wchar_t encoding if your execution character set contains characters outside the C0 controls and Basic Latin range, and UTF-16 is not a valid wchar_t encodi

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Mark Davis
;[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thu, 2003 Dec 11 04:29 Subject: Re: Text Editors and Canonical Equivalence (was Coloured diacritics) > On 10/12/2003 18:42, Kenneth Whistler wrote: > > > ... > > > >>And even then the word "interpretation"

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Philippe Verdy
> Thanks for the clarification. We are again talking at different levels. > I am still looking from the point of view of an application programmer > interested in a string as an abstract entity (an object or an abstract > data type) with a meaning or interpretation, but with no interest in the

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Peter Kirk
On 10/12/2003 18:42, Kenneth Whistler wrote: ... And even then the word "interpretation" needs to be clearly defined, see below. "Interpretation" has been *deliberately* left undefined. It falls back to its general English usage, because attempting a technical definition of "interpretation"

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-11 Thread Philippe Verdy
Christopher John Fynn wrote: > Peter Kirk wrote: > >Consider the following: > > (1) {U+00E9} > > (2) e{U+0301} > > (3) e > class="black-text">{U+0301} > > (4) e > class="red-text">{U+0301} > > > > I would expect (1), (2) and (3) to be rendered identically, and (4) to > > differ only in the colour

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-10 Thread Kenneth Whistler
Peter Kirk continued: > >Once again, people are falling afoul of the subtle distinctions > >that the Unicode conformance clauses are attempting to make. > > > > > In that case the distinctions are too subtle and need to be clarified. > C9 states that "no process can assume that another process

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-10 Thread Christopher John Fynn
Peter Kirk wrote: >Consider the following: > (1) {U+00E9} > (2) e{U+0301} > (3) e class="black-text">{U+0301} > (4) e{U+0301} > I would expect (1), (2) and (3) to be rendered identically, and (4) to > differ only in the colour of the accent, just as it would be (apart from > (1) if U+0301 were

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-10 Thread Peter Kirk
On 10/12/2003 13:36, Kenneth Whistler wrote: Peter Kirk averred: Agreed. C9 clearly specifies that a process cannot assume that another process will give a correct answer to the question "is this string normalised?", because that is to "assume that another process will make a distinction be

RE: Coloured diacritics

2003-12-10 Thread Philippe Verdy
Anto'nio Martins-Tuva'lkin writes: > On 2003.12.09, 11:25, Peter Kirk <[EMAIL PROTECTED]> wrote: > > > Philippe, you have now stated this (several times). But just a day > > earlier you yourself stated that the rule forbidding combining marks > > at the start of a string would never be relaxed bec

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-10 Thread Kenneth Whistler
Peter Kirk averred: > Agreed. C9 clearly specifies that a process cannot assume that another > process will give a correct answer to the question "is this string > normalised?", because that is to "assume that another process will make > a distinction between two different, but canonical-equiva

Re: Coloured diacritics

2003-12-10 Thread Anto'nio Martins-Tuva'lkin
On 2003.12.09, 11:25, Peter Kirk <[EMAIL PROTECTED]> wrote: > Philippe, you have now stated this (several times). But just a day > earlier you yourself stated that the rule forbidding combining marks > at the start of a string would never be relaxed because it is > fundamental to the XML containme

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-10 Thread jon
> > I've seen text/cpp and text/java, but really there are no such > > types. I've also > > seen text/x-source-code which is at least legal, if of little value to > > interoperability. > > > > The correct MIME type for C and C++ source files is text/plain. > > This is where I disagree: Brin

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread D. Starner
> Just imagine what would be created with your assumption with this source: > const wchar_t c = L'?'; > where ? is a combining character. The programmer would get bit. At best, there's no reason to assume that every compiler accepts UTF-8, besides that fact that you can't assume that the co

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-09 Thread Peter Kirk
On 09/12/2003 10:41, [EMAIL PROTECTED] wrote: Peter Kirk scripsit: ... (otherwise a normalizer would be impossible; it wouldn't know whether to normalize or not!) ... Not so. Normalisation is idempotent Quite right. I should have said that normalization *checking* would be imposs

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Philippe Verdy
[EMAIL PROTECTED] writes: > > > You might as well say that C code is not plain text because it too is > > > subject to special canons of interpretation. > > > > C, C++ and Java source files are not plain text as well (they > > have their own > > C, C++ and Java source files are plain text. > >

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-09 Thread jcowan
Peter Kirk scripsit: > >... (otherwise a normalizer > >would be impossible; it wouldn't know whether to normalize or not!) ... > > > Not so. Normalisation is idempotent Quite right. I should have said that normalization *checking* would be impossible. -- Only do what only you can do.

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-09 Thread Peter Kirk
On 09/12/2003 10:16, [EMAIL PROTECTED] wrote: Peter Kirk scripsit: No, surely not. If the wcslen() function is fully Unicode conformant, it should give the same output whatever the canonically equivalent form of its input. Not so. Remember, the conformance requirement is not that a pro

Re: Overload (was Re: Text Editors and Canonical Equivalence (was Coloured diacritics))

2003-12-09 Thread Peter Kirk
On 09/12/2003 10:01, Mark Davis wrote: No, surely not. If the wcslen() function is fully Unicode conformant, it should give the same output whatever the canonically equivalent form of its input. That more or less implies that it should normalise its input. No, that is not a requirement of Uni

Re: plain text (was RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jcowan
Peter Constable scripsit: > Perhaps we need some new terminology here. It might be helpful to > describe an XML file as a "plain-text-markup file" (PTM, for acronym > lovers), but reserve the term "plain text file" for files that contain > text with no markup. Note that the terms being defined are

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Peter Kirk
On 09/12/2003 06:36, [EMAIL PROTECTED] wrote: Perhaps so does yours. It isn't clear whether the CSS for .red-text would have to over-ride the default behaviour whereby an inline element like is rendered by stacking it to the left or right (depending on text directionality) of the previous inli

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-09 Thread jcowan
Peter Kirk scripsit: > No, surely not. If the wcslen() function is fully Unicode conformant, it > should give the same output whatever the canonically equivalent form of > its input. Not so. Remember, the conformance requirement is not that a process can't distinguish between canonically equiv

Overload (was Re: Text Editors and Canonical Equivalence (was Coloured diacritics))

2003-12-09 Thread Mark Davis
MAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tue, 2003 Dec 09 09:12 Subject: Re: Text Editors and Canonical Equivalence (was Coloured diacritics) > On 09/12/2003 07:00, Arcane Jill wrote: > > > > > Hmm. Now here's some C++ source code (syntax colored as Philippe >

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-09 Thread Peter Kirk
On 09/12/2003 07:00, Arcane Jill wrote: Hmm. Now here's some C++ source code (syntax colored as Philippe suggests, to imply that the text editor understands C++ at least well :enough to color it) int n = wcslen(L"café"); (That's int n = wcslen(L"café"); for those without HTML email) The L

plain text (was RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of [EMAIL PROTECTED] > XML files most certainly are plain text XML *can* be interpreted as plain text, or it can be interpreted as something *other* than plain text (i.e. XML). This ambiguity exists for any other plain-text-based ma

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Peter Constable
From: Philippe Verdy [mailto:[EMAIL PROTECTED] >> I see no particular value in this. The font rendering of base >> diacritic should be exactly the same as that for >> basediacritic provided the font >> characteristics are the same or do not affect metrics. > >This is wrong here: there's no guaran

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-09 Thread Doug Ewell
Arcane Jill wrote: > The intention of canonical equivalence is that the glyphs should > display the same - otherwise we'd need precomposed versions of, well, > everything. The intention of canonical equivalence is that *all* operations that involve "interpreting" the text treat two canonically eq

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jon
> > You might as well say that C code is not plain text because it too is > > subject to special canons of interpretation. > > C, C++ and Java source files are not plain text as well (they have their own C, C++ and Java source files are plain text. > "text/*" MIME type, which is NOT "text/plain"

Re: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-09 Thread Mark Davis
AIL PROTECTED]> Sent: Tue, 2003 Dec 09 00:30 Subject: RE: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)) > From: [EMAIL PROTECTED] on behalf of Kenneth Whistler > > >> Unicode doesn't prevent styling

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-09 Thread Arcane Jill
December 09, 2003 2:04 PM To:    [EMAIL PROTECTED] Cc:    [EMAIL PROTECTED] Subject:    RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup) I would not like to use any Unicode plain-text editor that implicitly normalizes the text without asking me, to work on programming sour

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Philippe Verdy
> You might as well say that C code is not plain text because it too is > subject to special canons of interpretation. C, C++ and Java source files are not plain text as well (they have their own "text/*" MIME type, which is NOT "text/plain" notably because of the rules associated with end-of-line

Re: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-09 Thread Jungshik Shin
On Mon, 8 Dec 2003, Peter Jacobi wrote: > It would be most interesting, if someone can point out a wordprocessor > or even a rendering library (shouldn't Pango be the solution to > everything?), > which enables styling of individual Tamil letters. I think Pango's attributed string ( http://de

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jon
> Your alternative suggestion using svg seemed to require the user to > handle the details of glyph positioning with specified horizontal > advances, which is surely a very strange requirement. Or maybe I have > misunderstood what was going on here. Perhaps so does yours. It isn't clear whether

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Peter Kirk
On 09/12/2003 05:13, [EMAIL PROTECTED] wrote: So, let's get this clear. Within an XML or HTML document, if I want an e with a red acute accent on it, it is quite permissible to write: e{U+0301} where {U+0301} is replaced by the actual Unicode character, and "red-text" is defined in the stylesh

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Philippe Verdy
[EMAIL PROTECTED] writes: > Philippe Verdy scripsit: > > XML files are definitely NOT plain text (if this was the case, > > then it would be forbidden to interpret "<" as a special markup > > character instead of the standard Unicode base character with > > its associated glyph)... > > You migh

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Peter Jacobi
Hi Peter, All, Peter Kirk <[EMAIL PROTECTED]> wrote: > [...] > [About é being correct HTML} > [...] > If this is correct, then the Tamil problem which Peter J is concerned > about has gone away completely, or at least it is reduced to a tricky > rendering issue. Jungshik and Martin already vot

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jcowan
Philippe Verdy scripsit: > XML files are definitely NOT plain text (if this was the case, then it would > be forbidden to interpret "<" as a special markup character instead of the > standard Unicode base character with its associated glyph)... You might as well say that C code is not plain text

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Philippe Verdy
> -Message d'origine- > De : Peter Kirk [mailto:[EMAIL PROTECTED] > Envoye : mardi 9 decembre 2003 13:17 > A : [EMAIL PROTECTED] > Cc : [EMAIL PROTECTED] > Objet : Re: Coloured diacritics (Was: Transcoding Tamil in the presence > of markup) > > >

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Philippe Verdy
[EMAIL PROTECTED] writes: > What is not allowed, and this makes XML technically non-conformant to the > Unicode Standard Where did you see that XML files need to be conformant to the Unicode standard? XML files are definitely NOT plain text (if this was the case, then it would be forbidden to int

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jon
> So, let's get this clear. Within an XML or HTML document, if I want an e > with a red acute accent on it, it is quite permissible to write: > > e{U+0301} > > where {U+0301} is replaced by the actual Unicode character, and > "red-text" is defined in the stylesheet. So it is not a problem that

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jcowan
Philippe Verdy scripsit: > When in doubt, don't perform any normalization of XML _files_ as they are > NOT plain text: you need a XML parser to do it safely only in relevant > sections of this file. All you could do safely is to possibly reencode XML > files (for example from UTF-8 to UTF-16 encod

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jon
> Anyone, please, is it or is it not true that XML forbids, or will forbid > in future versions, combining characters immediately after markup? XML does not forbid it, it does recommend you avoid it. Charmod defines "include-normalization" and "full-normalization" which go beyond Unicode normal

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread jcowan
Peter Kirk scripsit: > Anyone, please, is it or is it not true that XML forbids, or will forbid > in future versions, combining characters immediately after markup? XML 1.0 is silent on the subject. The W3C Character Model (which is not official yet) says that "content developers SHOULD avoid c

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Peter Kirk
On 09/12/2003 03:41, Philippe Verdy wrote: Peter Kirk writes: Philippe, you have now stated this (several times). But just a day earlier you yourself stated that the rule forbidding combining marks at the start of a string would never be relaxed because it is fundamental to the XML containme

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Philippe Verdy
Peter Kirk writes: > Philippe, you have now stated this (several times). But just a day > earlier you yourself stated that the rule forbidding combining marks at > the start of a string would never be relaxed because it is fundamental > to the XML containment model. You don't usually contradict

Re: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-09 Thread Peter Kirk
On 08/12/2003 16:17, Kenneth Whistler wrote: ... Having an 'invisible consonant' to call for rendering of the vowel sign in isolation (and without the dotted circle), would also help the limited number of cases where the styled single character is needed - but in a rather hackish way. That i

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-09 Thread Peter Kirk
On 08/12/2003 15:51, Philippe Verdy wrote: ... Peter Kirk writes: Agreed. But now we are told that the latter is illegal XML because a combining mark is not permitted (by XML, not by Unicode) after . It is not forbidden by XML. It's just that handling a XML file (which is not plain-text)

RE: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-09 Thread Peter Constable
From: [EMAIL PROTECTED] on behalf of Kenneth Whistler >> Unicode doesn't prevent styling, of course. But having 'logical' order >> instead of 'visual' makes it a hard task for the application and the >> renderer. >> This is witnessed by the thin-spread support for this. > >Yes... Ken conceded th

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Chris Jacobs
- Original Message - From: "Christopher John Fynn" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Monday, December 08, 2003 6:03 PM Subject: Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup) > Andrew West wro

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Philippe Verdy
Peter Constable writes: > > A very tentative suggestion for some glue: a character which can take > > combining marks but whose function is to throw those marks back on to > > the preceding base character, preceding any markup. > > I see no particular value in this. The font rendering of base > di

RE: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-08 Thread Kenneth Whistler
Peter Jacobi said: > Unicode doesn't prevent styling, of course. But having 'logical' order > instead of 'visual' makes it a hard task for the application and the > renderer. > This is witnessed by the thin-spread support for this. Yes, but having visual order instead of logical order makes *othe

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Philippe Verdy
-Message d'origine- De :Philippe Verdy [mailto:[EMAIL PROTECTED] Envoye :mardi 9 decembre 2003 00:11 A : Peter Kirk Cc :[EMAIL PROTECTED] Objet : RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup) Peter Kirk writes: > Agreed. But no

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Philippe Verdy
-Message d'origine- De :Philippe Verdy [mailto:[EMAIL PROTECTED] Envoye :mardi 9 decembre 2003 00:11 A : Peter Kirk Cc :[EMAIL PROTECTED] Objet : RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup) Peter Kirk writes: > Agreed. But no

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Philippe Verdy
Peter Kirk writes: > Agreed. But now we are told that the latter is illegal XML because a > combining mark is not permitted (by XML, not by Unicode) after . It is not forbidden by XML. It's just that handling a XML file (which is not plain-text) as if it was a Unicode plain-text when performing n

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Mete Kural
Being able to color diacritics and other characters in rendering would be great. We are trying to develop some tools to research the Quran and one of the tools is a sophisticated search engine that can search for substrings and display the search results while emphasizing the searched substrings

RE: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-08 Thread Peter Jacobi
Dear Peter Constable, Peter Kirk, All, "Peter Constable" <[EMAIL PROTECTED]> wrote: > SIL's Graphite definitely *will* permit exactly what you want to do > (assuming the font is properly designed). [...] Thanks for this clarification. Having tried SIL WorldPad with Tamil Graphite font, and not

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Peter Kirk
On 08/12/2003 11:35, Peter Constable wrote: ... I see no particular value in this. The font rendering of base diacritic should be exactly the same as that for basediacritic provided the font characteristics are the same or do not affect metrics. Agreed. But now we are told that the latter is i

RE: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-08 Thread Philippe Verdy
Peter Jacobi > To re-iterate - in the original post, the string in question did > consist of side by side characters, not ligated in any font known > to me. And the legacy Tamil enocings have for obvious reasons no > problem to style any single character. This specific case is not the one of "side

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Peter Kirk > And what if you want to colour just the dot on i? Or just the crossbar > on a t? Use Illustrator or Photoshop or Freehand or whatever your favourite graphics application is. > A very tentative suggestion for some

Re: Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-08 Thread Peter Kirk
On 08/12/2003 10:16, Peter Jacobi wrote: ... So, to promote Unicode usage, in a community, which partly sees ISCII unification as a conspiracy against the Dravidian languages, it would be very helpful to demonstrate, that everything that can be done with the legacy encodings, can also be done usin

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Peter Kirk
On 08/12/2003 10:57, Jungshik Shin wrote: ... You're another 'victim'(?!) of the multi-level representability of the Korean script. Although I consistently used syllables, letters (Jamos: complex/compund vs simple/basic), it may not have been clear to you. ... Peter, can you just open up TUS 4

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Jungshik Shin
On Mon, 8 Dec 2003, Peter Kirk wrote: > On 08/12/2003 08:37, Doug Ewell wrote: > > >Peter Kirk wrote: > >>I may have missed or misunderstood the details, but it has been > >>clearly stated here in the last few days that (a) there are more > >>than 11,000 redundant Korean characters in the BMP, a

Transcoding Tamil in the presence of markup (was Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup))

2003-12-08 Thread Peter Jacobi
Dear All, I find it rather disappointing, that the the question of coloring the horizontal line of 't' attracts more attention, than the original question. To re-iterate - in the original post, the string in question did consist of side by side characters, not ligated in any font known to me. And

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Philippe Verdy
Christopher John Fynn wrote: > Andrew West wrote: > > > ... and similar stroke-by-stroke incremental diagrams showing > > how to write CJK ideographs are even more common in (Chinese, > > Japanese, etc.) pedagogical texts intended for both native > > children and for foreigners. I've also seen

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Peter Kirk
On 08/12/2003 08:37, Doug Ewell wrote: Peter Kirk wrote: I may have missed or misunderstood the details, but it has been clearly stated here in the last few days that (a) there are more than 11,000 redundant Korean characters in the BMP, and (b) many precomposed Korean characters lack canonic

Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

2003-12-08 Thread Christopher John Fynn
Andrew West wrote: > ... and similar stroke-by-stroke incremental diagrams showing how to write CJK > ideographs are even more common in (Chinese, Japanese, etc.) pedagogical texts > intended for both native children and for foreigners. I've also seen such > diagrams in Tibetan pedagogical texts,

  1   2   >