On 29/10/2003 15:07, John Cowan wrote:
Not necessarily. A process may check its input for normalization and
reject it if it is not normalized, and XML consumers are encouraged
(not required) to do so.
This looks to me like a clear breach of C9, at least of the derived
principle
no process ca
From: "Jim Allan" <[EMAIL PROTECTED]>
> It seems to me that Cedilla/undercomma folding would be a useful
> addition to "Charater Foldings" at http://www.unicode.org/reports/tr30.
Excellent idea, however it has to be tailored by language:
For example, Turkish and French (which almost always and co
Peter Kirk scripsit:
> [A process] must
> interpret a non-normalised variant in the same way as the normalised
> form; and it cannot assume that the process presenting the data makes a
> distinction between the normalised and non-normalised form and does not
> reorder the data into an arbitrar
From: "John Hudson" <[EMAIL PROTECTED]>
> All of these fonts already include the newer Romanian S/s and T/t
> commaaccent characters and correct accent forms for the Latvian diacritics
> (although the Arial comma accent is a bit too much like an unattached
cedilla).
I meant for Windows 9x/ME users
On 29/10/2003 14:14, John Cowan wrote:
Peter Kirk scripsit:
Is this actually a conformance requirement? I thought I understood the
following: A rendering engine which fails to render canonical
equivalents identically, or fails to render certain orders sensibly, is
not doing what the Unicode
Peter Kirk scripsit:
> Is this actually a conformance requirement? I thought I understood the
> following: A rendering engine which fails to render canonical
> equivalents identically, or fails to render certain orders sensibly, is
> not doing what the Unicode standard tells it that it must do.
Language Analysis Systems, Inc. Unicode list reader scripsit:
> It suggests that for many fonts,
>
> U+0067 LATIN SMALL LETTER G + U+0327 COMBINING CEDILLA
>
> and
>
> U+0067 LATIN SMALL LETTER G + U+0312 COMBINING TURNED COMMA ABOVE
>
> would have exactly the same rendering. Some applicatio
From: "Peter Kirk" <[EMAIL PROTECTED]>
> On 29/10/2003 10:46, John Hudson wrote:
>
> > While we're about it, we could propose a spacing, non-breaking ELIDED
> > CHARACTER for use in ketiv/qere where combining marks need to be
> > applied to empty space within a word.
>
> How would this differ from
On 29/10/2003 11:53, John Cowan wrote:
... A
rendering engine is *not* entitled to misbehave if it receives
cedilla> and try to place the dot between the "a" glyph and the cedilla;
this is a direct consequence of the conformance requirement that processes
not distinguish (unless they have speci
At 12:33 PM 10/29/2003, Philippe Verdy wrote:
Even today, it is quite hard to find any Romanian or Latvian web page using
the new Unicode characters with a comma-below: even governmental sites use
the characters coded with the cedilla, and they support that this comma
below is rendered approximate
Rich Gilliam wrote:
It suggests that
for many fonts,
U+0067 LATIN SMALL LETTER G + U+0327 COMBINING CEDILLA
and
U+0067 LATIN SMALL LETTER G + U+0312 COMBINING TURNED COMMA ABOVE
would have exactly the same rendering. Some applications would need to
know this and treat U+0067 U+0327 the same as
On 29/10/2003 10:46, John Hudson wrote:
At 10:26 AM 10/29/2003, Philippe Verdy wrote:
The problem I see here is that ZWJ is not intended to create ligatures
between diacritics, only between clusters that would otherwise still
be a
single combining sequence.
Normally CGJ would have fitted better
On 29/10/2003 10:32, Philippe Verdy wrote:
From: "Toyin Ryan" <[EMAIL PROTECTED]>
To: "Philippe Verdy" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, October 29, 2003 6:11 PM
Subject: Re: Hacek - Typing from a keyboard... Help
Thank you Philippe. The characters you detailed below
On Oct 27, 2003, at 12:30 PM, Sue and Maurice Bauhahn wrote:
Nothing of the sort is available on Macintosh (partly because Mac
applications that support ATSUI appear to be even more rare;-().
Unicode support in applications is widespread on Mac OS X. In
particular, all the built-in applications sh
On 29/10/2003 10:26, Philippe Verdy wrote:
...
The problem I see here is that ZWJ is not intended to create ligatures
between diacritics, only between clusters that would otherwise still be a
single combining sequence.
Normally CGJ would have fitted better there, but this conflicts with the
inten
- Original Message -
From: "John Hudson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "'Jim Allan'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, October 29, 2003 6:15 PM
Subject: RE: Merging combining classes, was: New contribution N2676
> At 04:04 AM 10/29/2003, Kent Ka
Jim Allan scripsit:
> << For example, it is crucial that the combining class of the cedilla be
> lower than the combining class of the dot below, although their exact
> values of 202 and 220 are not important for implementation. >>
>
> This is not explained, but obviously the reason why it is "
From: "Toyin Ryan" <[EMAIL PROTECTED]>
> Thank you Philippe. The characters you detailed below only seem to work in
Word.
> They don't work in DBArtisan or netscape messenger or outlook.
There are a lot of applications that can use and render these characters. At
least all those applications that
From: "Jim Allan" <[EMAIL PROTECTED]>
> Kent Karlson posted:
>
> > COMBINING COMMA BELOW is not "attached", even though cedilla is.
> > A turned comma above is not _attached_ above...
>
> Correct. COMBINING COMMA BELOW belongs to combining class 220.
>
> However by Unicode specifications both it a
>However by Unicode specifications both it and an attached lower cedilla
>on _g_ may be rendered by unattached turned comma above which interacts
>with characters not in their respective combining classes. And this
new
>turned comma above of necessity would always be applied before normal
>uppe
At 10:26 AM 10/29/2003, Philippe Verdy wrote:
The problem I see here is that ZWJ is not intended to create ligatures
between diacritics, only between clusters that would otherwise still be a
single combining sequence.
Normally CGJ would have fitted better there, but this conflicts with the
intent
At 10:26 AM 10/29/2003, Philippe Verdy wrote:
In the sil.org proposal, the medial meteg is missing, but not the right and
left meteg, as they are encoded within the same class and their order is
preserved when attached to a vowel.
It is not missing, per se. It was presumed that the medial meteg wo
From: "Toyin Ryan" <[EMAIL PROTECTED]>
To: "Philippe Verdy" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, October 29, 2003 6:11 PM
Subject: Re: Hacek - Typing from a keyboard... Help
> Thank you Philippe. The characters you detailed below only seem to work in
Word.
> They don't wo
From: "John Hudson" <[EMAIL PROTECTED]>
> At 08:17 AM 10/29/2003, Peter Kirk wrote:
>
> >Normally meteg is positioned below and to the left of any other low
> >centred mark. Less frequently it is positioned to the right of a low
> >centred mark. But it is always to the left of a low right mark i.e.
> Normally meteg is positioned below and to the left of any other low
> centred mark. Less frequently it is positioned to the right of a low
> centred mark. But it is always to the left of a low right mark i.e.
> yetiv or dehi. It can also be centred within a hataf vowel. In
> http://www.qaya.org/a
From: "Toyin Ryan" <[EMAIL PROTECTED]>
> Thank you Philippe. The characters you detailed below only seem to work in
Word.
> They don't work in DBArtisan or netscape messenger or outlook. Do you have
> anymore ideas?
As I mentioned before, it looks like DBArtisan is not a Unicode application
and y
At 04:04 AM 10/29/2003, Kent Karlsson wrote:
The Latvian "cedillas" are really commas below, and are best encoded so.
Still for lowercase g (not for uppercase) the comma below is _rendered_
as a turned comma above.
The 'not for uppercase' rule depends on the design of the uppercase letter.
Typica
Hello Tony,
Number one question: have you verified that DBArtisan 7 actually has
support for Unicode? I find nothing at all about Unicode support on QBS
Software web site, nor in the Embarcadero white paper, when I look at the
features of their products.
Rick
> However, I want to t
Rick McGowan wrote:
"Caron" [...] is *NOT* in current use at all in English.
It is widely used in the typography community, for better or for worse.
Eric.
Thank you Philippe. The characters you detailed below only seem to work in Word.
They don't work in DBArtisan or netscape messenger or outlook. Do you have
anymore ideas?
Philippe Verdy wrote:
> The Unicode English name of the "hacek" character is "caron" (U+030C), and
> the "straight line above"
Philippe wrote...
> The Unicode English name of the "hacek" character is "caron" (U+030C)
Just for the record: Actually, in English, we still call it a hacek.
"Caron" is a term apparently invented in an ISO character encoding
committee, and is *NOT* in current use at all in English. We call
Philippe Verdy wrote:
the input:determine strategy will work fine for UTF-8 or SCSU, provided that
the leading BOM is explicitly encoded. ...
With "determine" I do not mean to restrict to checking for a BOM. There are several ways to
determine the input charset, depending on the protocol and docum
Like German "heute" (="today") where the "eu" sounds like the "oy" in Spanish "hoy"?
hui=hoy=heu(te)... Neat!
markus
Michael Everson wrote:
At 23:07 +0100 2003-10-27, Philippe Verdy wrote:
The historic French word "hui" is now completely obsoleted, and commonly
found only in the single expressio
[EMAIL PROTECTED] wrote:
is it possible to design a program that takes the vaLue of the osmanya script
and compare it with the somali latin script. then afterwards, displaying the
equivalent.
Generally, yes - this is called script transliteration. You could try this online at
http://oss.softwar
Philippe Verdy wrote:
> The idea that "if a text (without BOM) looks like valid
> UTF-8, then it is
> UTF-8; else it uses another legacy encoding" does not work in
> practice and also leads to too many false positives.
Can you point to actual data/cases? I don't mean theoretical, I can make up
Kent Karlson posted:
COMBINING COMMA BELOW is not "attached", even though cedilla is.
A turned comma above is not _attached_ above...
Correct. COMBINING COMMA BELOW belongs to combining class 220.
However by Unicode specifications both it and an attached lower cedilla
on _g_ may be rendered by u
wrote:
> is it possible to design a program that takes the vaLue of the osmanya
> script and compare it with the somali latin script. then afterwards,
> displaying the equivalent.
>
> by this i mean, if you tell the program that the letter A is a given
> code in osmanya script as well as in soma
Peter Kirk wrote:
Rather, it defines that they do not. But since this is not true on any
reasonable intuitive definition of "interact typographically" (as we
have seen with Hebrew vowel points), this statement makes sense only as
a counterintuitive definition of "interact typographically".
Exactl
please tell me if is it possible to design a program that takes the vaLue of
the osmanya script and compare it with the somali latin script. then
afterwards, displaying the equivalent.
by this i mean, if you tell the program that the letter A is a given code in
osmanya script as well as in s
is it possible to design a program that takes the vaLue of the osmanya script
and compare it with the somali latin script. then afterwards, displaying the
equivalent.
by this i mean, if you tell the program that the letter A is a given code in
osmanya script as well as in somali latin script
Thank you, Philippe. I include the full text of your posting plus the
attachment for the benefit of those on the Unicode Hebrew list who have
missed out on this. Some of the issues here have already been discussed
on that list. Also I wonder if you have seen
http://scripts.sil.org/cms/sites/nrs
> << A similar situation can be seen in the Latvian letter U+0123 LATIN
> SMALL LETTER G WITH CEDILLA. In good Latvian typography, this
> character
> is always shown with a rotated comma over the g, rather than
> a cedilla
> below the g, because of the typographical design and layout issues
From: "Toyin Ryan" <[EMAIL PROTECTED]>
> I am trying to type the 'hacek' diacritic mark above 'c' and 'e' and
> also a straight line (not a tilda) above characters too.
>
> The hacek is a diacritic mark used in the Czech and Lithuanian
> languages. It looks like an upside-down circumflex or a poin
The Unicode English name of the "hacek" character is "caron" (U+030C), and
the "straight line above" is named "macron" (U+030A).
Your word processor is probably inserting a decomposed caron or macron after
the base letter it modifies.
Did you try with the precomposed characters?
C with Hacek:
- Original Message -
From: "Markus Scherer" <[EMAIL PROTECTED]>
To: "unicode" <[EMAIL PROTECTED]>
Sent: Tuesday, October 28, 2003 11:35 PM
Subject: Re: unicode on Linux
> You should use Unicode internally - UTF-16 when you use ICU or most other
libraries and software.
>
> Externally, th
On 28/10/2003 20:01, Jim Allan wrote:
...
From _The Unicode Standard 4.0_, 3.11 at
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf:
<< If combining characters have different combining classes--for
example, when one nonspacing mark is above a base character form and
another is below it--
Hello subcribers
I am trying to type the 'hacek' diacritic mark above 'c' and 'e' and
also a straight line (not a tilda) above characters too.
The hacek is a diacritic mark used in the Czech and Lithuanian
languages. It looks like an upside-down circumflex or a pointed
breve-essentially a small "
47 matches
Mail list logo