Christopher John Fynn wrote:
> Peter Kirk wrote:
> >Consider the following:
> > (1) {U+00E9}
> > (2) e{U+0301}
> > (3) e > class="black-text">{U+0301}
> > (4) e > class="red-text">{U+0301}
> >
> > I would expect (1), (2) and (3) to be rendered identically, and (4) to
> > differ only in the colour
Peter Kirk wrote:
>Consider the following:
> (1) {U+00E9}
> (2) e{U+0301}
> (3) e class="black-text">{U+0301}
> (4) e{U+0301}
> I would expect (1), (2) and (3) to be rendered identically, and (4) to
> differ only in the colour of the accent, just as it would be (apart from
> (1) if U+0301 were
> > I've seen text/cpp and text/java, but really there are no such
> > types. I've also
> > seen text/x-source-code which is at least legal, if of little value to
> > interoperability.
> >
> > The correct MIME type for C and C++ source files is text/plain.
>
> This is where I disagree:
Brin
> Just imagine what would be created with your assumption with this source:
> const wchar_t c = L'?';
> where ? is a combining character.
The programmer would get bit. At best, there's no reason to assume that
every compiler accepts UTF-8, besides that fact that you can't assume that
the co
[EMAIL PROTECTED] writes:
> > > You might as well say that C code is not plain text because it too is
> > > subject to special canons of interpretation.
> >
> > C, C++ and Java source files are not plain text as well (they
> > have their own
>
> C, C++ and Java source files are plain text.
>
>
Peter Constable scripsit:
> Perhaps we need some new terminology here. It might be helpful to
> describe an XML file as a "plain-text-markup file" (PTM, for acronym
> lovers), but reserve the term "plain text file" for files that contain
> text with no markup. Note that the terms being defined are
On 09/12/2003 06:36, [EMAIL PROTECTED] wrote:
Perhaps so does yours. It isn't clear whether the CSS for .red-text would have
to over-ride the default behaviour whereby an inline element like is
rendered by stacking it to the left or right (depending on text directionality)
of the previous inli
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of [EMAIL PROTECTED]
> XML files most certainly are plain text
XML *can* be interpreted as plain text, or it can be interpreted as
something *other* than plain text (i.e. XML). This ambiguity exists for
any other plain-text-based ma
From: Philippe Verdy [mailto:[EMAIL PROTECTED]
>> I see no particular value in this. The font rendering of base
>> diacritic should be exactly the same as that for
>> basediacritic provided the font
>> characteristics are the same or do not affect metrics.
>
>This is wrong here: there's no guaran
> > You might as well say that C code is not plain text because it too is
> > subject to special canons of interpretation.
>
> C, C++ and Java source files are not plain text as well (they have their own
C, C++ and Java source files are plain text.
> "text/*" MIME type, which is NOT "text/plain"
AIL PROTECTED]>
Sent: Tue, 2003 Dec 09 00:30
Subject: RE: Transcoding Tamil in the presence of markup (was Re: Coloured
diacritics (Was: Transcoding Tamil in the presence of markup))
> From: [EMAIL PROTECTED] on behalf of Kenneth Whistler
>
> >> Unicode doesn't prevent styling
> You might as well say that C code is not plain text because it too is
> subject to special canons of interpretation.
C, C++ and Java source files are not plain text as well (they have their own
"text/*" MIME type, which is NOT "text/plain" notably because of the rules
associated with end-of-line
On Mon, 8 Dec 2003, Peter Jacobi wrote:
> It would be most interesting, if someone can point out a wordprocessor
> or even a rendering library (shouldn't Pango be the solution to
> everything?),
> which enables styling of individual Tamil letters.
I think Pango's attributed
string (
http://de
> Your alternative suggestion using svg seemed to require the user to
> handle the details of glyph positioning with specified horizontal
> advances, which is surely a very strange requirement. Or maybe I have
> misunderstood what was going on here.
Perhaps so does yours. It isn't clear whether
On 09/12/2003 05:13, [EMAIL PROTECTED] wrote:
So, let's get this clear. Within an XML or HTML document, if I want an e
with a red acute accent on it, it is quite permissible to write:
e{U+0301}
where {U+0301} is replaced by the actual Unicode character, and
"red-text" is defined in the stylesh
[EMAIL PROTECTED] writes:
> Philippe Verdy scripsit:
> > XML files are definitely NOT plain text (if this was the case,
> > then it would be forbidden to interpret "<" as a special markup
> > character instead of the standard Unicode base character with
> > its associated glyph)...
>
> You migh
Hi Peter, All,
Peter Kirk <[EMAIL PROTECTED]> wrote:
> [...]
> [About é being correct HTML}
> [...]
> If this is correct, then the Tamil problem which Peter J is concerned
> about has gone away completely, or at least it is reduced to a tricky
> rendering issue.
Jungshik and Martin already vot
Philippe Verdy scripsit:
> XML files are definitely NOT plain text (if this was the case, then it would
> be forbidden to interpret "<" as a special markup character instead of the
> standard Unicode base character with its associated glyph)...
You might as well say that C code is not plain text
> -Message d'origine-
> De : Peter Kirk [mailto:[EMAIL PROTECTED]
> Envoye : mardi 9 decembre 2003 13:17
> A : [EMAIL PROTECTED]
> Cc : [EMAIL PROTECTED]
> Objet : Re: Coloured diacritics (Was: Transcoding Tamil in the presence
> of markup)
>
>
>
[EMAIL PROTECTED] writes:
> What is not allowed, and this makes XML technically non-conformant to the
> Unicode Standard
Where did you see that XML files need to be conformant to the Unicode
standard?
XML files are definitely NOT plain text (if this was the case, then it would
be forbidden to int
> So, let's get this clear. Within an XML or HTML document, if I want an e
> with a red acute accent on it, it is quite permissible to write:
>
> e{U+0301}
>
> where {U+0301} is replaced by the actual Unicode character, and
> "red-text" is defined in the stylesheet. So it is not a problem that
Philippe Verdy scripsit:
> When in doubt, don't perform any normalization of XML _files_ as they are
> NOT plain text: you need a XML parser to do it safely only in relevant
> sections of this file. All you could do safely is to possibly reencode XML
> files (for example from UTF-8 to UTF-16 encod
> Anyone, please, is it or is it not true that XML forbids, or will forbid
> in future versions, combining characters immediately after markup?
XML does not forbid it, it does recommend you avoid it.
Charmod defines "include-normalization" and "full-normalization" which go
beyond Unicode normal
Peter Kirk scripsit:
> Anyone, please, is it or is it not true that XML forbids, or will forbid
> in future versions, combining characters immediately after markup?
XML 1.0 is silent on the subject.
The W3C Character Model (which is not official yet) says that
"content developers SHOULD avoid c
On 09/12/2003 03:41, Philippe Verdy wrote:
Peter Kirk writes:
Philippe, you have now stated this (several times). But just a day
earlier you yourself stated that the rule forbidding combining marks at
the start of a string would never be relaxed because it is fundamental
to the XML containme
Peter Kirk writes:
> Philippe, you have now stated this (several times). But just a day
> earlier you yourself stated that the rule forbidding combining marks at
> the start of a string would never be relaxed because it is fundamental
> to the XML containment model. You don't usually contradict
On 08/12/2003 16:17, Kenneth Whistler wrote:
...
Having an 'invisible consonant' to call for rendering of the vowel sign
in isolation (and without the dotted circle), would also help the limited
number of cases where the styled single character is needed - but in
a rather hackish way.
That i
On 08/12/2003 15:51, Philippe Verdy wrote:
...
Peter Kirk writes:
Agreed. But now we are told that the latter is illegal XML because a
combining mark is not permitted (by XML, not by Unicode) after .
It is not forbidden by XML. It's just that handling a XML file (which is not
plain-text)
From: [EMAIL PROTECTED] on behalf of Kenneth Whistler
>> Unicode doesn't prevent styling, of course. But having 'logical' order
>> instead of 'visual' makes it a hard task for the application and the
>> renderer.
>> This is witnessed by the thin-spread support for this.
>
>Yes...
Ken conceded th
- Original Message -
From: "Christopher John Fynn" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Monday, December 08, 2003 6:03 PM
Subject: Re: Coloured diacritics (Was: Transcoding Tamil in the presence of
markup)
> Andrew West wro
Peter Constable writes:
> > A very tentative suggestion for some glue: a character which can take
> > combining marks but whose function is to throw those marks back on to
> > the preceding base character, preceding any markup.
>
> I see no particular value in this. The font rendering of base
> di
Peter Jacobi said:
> Unicode doesn't prevent styling, of course. But having 'logical' order
> instead of 'visual' makes it a hard task for the application and the
> renderer.
> This is witnessed by the thin-spread support for this.
Yes, but having visual order instead of logical order makes
*othe
-Message d'origine-
De :Philippe Verdy [mailto:[EMAIL PROTECTED]
Envoye :mardi 9 decembre 2003 00:11
A : Peter Kirk
Cc :[EMAIL PROTECTED]
Objet : RE: Coloured diacritics (Was: Transcoding Tamil in the presence of
markup)
Peter Kirk writes:
> Agreed. But no
-Message d'origine-
De :Philippe Verdy [mailto:[EMAIL PROTECTED]
Envoye :mardi 9 decembre 2003 00:11
A : Peter Kirk
Cc :[EMAIL PROTECTED]
Objet : RE: Coloured diacritics (Was: Transcoding Tamil in the presence of
markup)
Peter Kirk writes:
> Agreed. But no
Peter Kirk writes:
> Agreed. But now we are told that the latter is illegal XML because a
> combining mark is not permitted (by XML, not by Unicode) after .
It is not forbidden by XML. It's just that handling a XML file (which is not
plain-text) as if it was a Unicode plain-text when performing n
Being able to color diacritics and other characters in rendering would be great. We
are trying to develop some tools to research the Quran and one of the tools is a
sophisticated search engine that can search for substrings and display the search
results while emphasizing the searched substrings
Dear Peter Constable, Peter Kirk, All,
"Peter Constable" <[EMAIL PROTECTED]> wrote:
> SIL's Graphite definitely *will* permit exactly what you want to do
> (assuming the font is properly designed). [...]
Thanks for this clarification. Having tried SIL WorldPad with Tamil
Graphite
font, and not
On 08/12/2003 11:35, Peter Constable wrote:
...
I see no particular value in this. The font rendering of base
diacritic should be exactly the same as that for
basediacritic provided the font
characteristics are the same or do not affect metrics.
Agreed. But now we are told that the latter is i
Peter Jacobi
> To re-iterate - in the original post, the string in question did
> consist of side by side characters, not ligated in any font known
> to me. And the legacy Tamil enocings have for obvious reasons no
> problem to style any single character.
This specific case is not the one of "side
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Peter Kirk
> And what if you want to colour just the dot on i? Or just the crossbar
> on a t?
Use Illustrator or Photoshop or Freehand or whatever your favourite
graphics application is.
> A very tentative suggestion for some
On 08/12/2003 10:16, Peter Jacobi wrote:
...
So, to promote Unicode usage, in a community, which partly sees
ISCII unification as a conspiracy against the Dravidian languages,
it would be very helpful to demonstrate, that everything that can
be done with the legacy encodings, can also be done usin
On 08/12/2003 10:57, Jungshik Shin wrote:
...
You're another 'victim'(?!) of the multi-level representability of the
Korean script. Although I consistently used syllables, letters (Jamos:
complex/compund vs simple/basic), it may not have been clear to you.
...
Peter, can you just open up TUS 4
On Mon, 8 Dec 2003, Peter Kirk wrote:
> On 08/12/2003 08:37, Doug Ewell wrote:
>
> >Peter Kirk wrote:
> >>I may have missed or misunderstood the details, but it has been
> >>clearly stated here in the last few days that (a) there are more
> >>than 11,000 redundant Korean characters in the BMP, a
Dear All,
I find it rather disappointing, that the the question of coloring
the horizontal line of 't' attracts more attention, than the
original question.
To re-iterate - in the original post, the string in question did
consist of side by side characters, not ligated in any font known
to me. And
Christopher John Fynn wrote:
> Andrew West wrote:
>
> > ... and similar stroke-by-stroke incremental diagrams showing
> > how to write CJK ideographs are even more common in (Chinese,
> > Japanese, etc.) pedagogical texts intended for both native
> > children and for foreigners. I've also seen
On 08/12/2003 08:37, Doug Ewell wrote:
Peter Kirk wrote:
I may have missed or misunderstood the details, but it has been
clearly stated here in the last few days that (a) there are more
than 11,000 redundant Korean characters in the BMP, and (b) many
precomposed Korean characters lack canonic
Andrew West wrote:
> ... and similar stroke-by-stroke incremental diagrams showing how to write
CJK
> ideographs are even more common in (Chinese, Japanese, etc.) pedagogical
texts
> intended for both native children and for foreigners. I've also seen such
> diagrams in Tibetan pedagogical texts,
Peter Kirk wrote:
> I may have missed or misunderstood the details, but it has been
> clearly stated here in the last few days that (a) there are more
> than 11,000 redundant Korean characters in the BMP, and (b) many
> precomposed Korean characters lack canonical or even compatibility
> decompos
Andrew C. West <[EMAIL PROTECTED]>
> ... and similar stroke-by-stroke incremental diagrams showing how to
> write CJK ideographs are even more common in (Chinese, Japanese,
> etc.) pedagogical texts intended for both native children and for
> foreigners. I've also seen such diagrams in Tibetan ped
On 07/12/2003 17:40, Doug Ewell wrote:
Peter Kirk wrote:
Well, this is W3C's problem. They seem to have backed themselves into
a corner which they need to get out of but have no easy way of doing
so.
Only if this issue of applying style to individual combining marks is
considered a suffi
Of course, display of coloured diacritics isn't plain text.
--
Michael Everson * * Everson Typography * * http://www.evertype.com
On Sun, 7 Dec 2003 17:40:25 -0800, "Doug Ewell" wrote:
> There are plenty of things one can do with writing that aren't supported
> by computer encodings, and aren't really expected to be. The idea of a
> black "i" with a red dot was mentioned. Here's another: the
> piece-by-piece "exploded diagr
>Doug Ewell [mailto:[EMAIL PROTECTED] writes:
>> Peter Kirk wrote:
>> > Unicode is of course very familiar with this kind of situation e.g.
>> > with character name errors, combining class errors, 11000+ redundant
>> > Korean characters without decompositions, etc etc.
>>
>> "Without decompositio
Peter Kirk wrote:
> Well, this is W3C's problem. They seem to have backed themselves into
> a corner which they need to get out of but have no easy way of doing
> so.
Only if this issue of applying style to individual combining marks is
considered a sufficiently important text operation do they
Peter Kirk wrote:
> On 07/12/2003 15:40, Philippe Verdy wrote:
> > Peter Kirk wrote:
> > > Of course there is an even simpler way to provide the glue I
> > > was talking about. W3C simply needs to relax the rule forbidding
> > > combining marks at the start of a string (and interpret the one
> >
> Of course there is an even simpler way to provide the glue I was talking
> about. W3C simply needs to relax the rule forbidding combining marks at
> the start of a string (and interpret the one precomposed character with
> ">" as base as if it were decomposed, as I suggested before), and,
> r
On 07/12/2003 15:40, Philippe Verdy wrote:
Of course there is an even simpler way to provide the glue I was talking
about. W3C simply needs to relax the rule forbidding combining marks at
the start of a string (and interpret the one precomposed character with
">" as base as if it were decompose
On 07/12/2003 12:10, Philippe Verdy wrote:
The glue seems good in apparence but much too complex to implement in
Unicode. I do think that specific occurences of compelx styles must be
handled with a stylesheet, where any given grapheme cluster is applied a
composite style as a whole.
...
Peter Kirk writes:
> A very tentative suggestion for some glue: a character which can take
> combining marks but whose function is to throw those marks back on to
> the preceding base character, preceding any markup. This would have to
> be a zero width base character, not a format character or
On 07/12/2003 02:40, Philippe Verdy wrote:
...
Just one example, suppose that you want to color the circumflex above
a lowercase i or above a uppercase A: the base letters have distinct
widths (meaning that the diacritic has a different horizontal position),
distinct height (meaning that the diac
John Hudson writes:
> At 03:53 PM 12/6/2003, Philippe Verdy wrote:
>
> >Still this is an interesting problem: some texts for example want to
> >exhibit some diacritics added to a base letter with a distinct color,
> >notably in linguistic texts related to grammar or orthography.
> >
> >So for exam
I wrote:
The way to do this is to decompose bases and marks at the glyph level if
they are not already decomposed at the character level...
I meant to say *one* way to do this...
I didn't mean to imply that it was the only way, or necessarily the best
way. It would be interesting and useful to
62 matches
Mail list logo