do not plan to leave it online once Michael
has his content corrected. In the long run, it really is unhelpful to
have alternate sources for data. Inevitably, the mirrors get out of sync
as the owners move on to other interests, and inevitably someone points
to the copy, not the source.
Peter
P
n block, then that will validate that it was a useful thing to
do; but if there are *not*, then the unification-camp has little cause
for concern about existence of distinctly-encoded data.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
Michael has simply said he has never
seen such a font. If you want to support the claim, you cannot simply
question how thoroughly Michael has searched -- it's not his
responsibility. If you know of such fonts, then please identify them.
Peter
Peter Constable
Globalization Infrastruct
Could everyone please exercise good editorial practice on their
postings? It's ridiculous to have to scroll to the third screen-full of
text to find where the poster's comments begin.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
).
IMO, the structure of data is effectively determined by how processes
will interpret the data. A process won't see 6 columns one of which
contains " ". It will see seven columns one of which contains
" ".
He's said the file has been fixed (though I don
he most important thing is stability, but it
makes sense that the first and second columns be the symbolic code and
the numeric code, especially if this is *the* plain-text version and
normative reference.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
in the newest one.
It is not fixed in the file that's on the site now. If this is the
normative file, I'd suggest you fix it as soon as possible.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
Or substitute whatever text describes *why* these are being shown here.
You've got to say *something* about them, else it's completely unclear
whether the reader is supposed to care about them or not, and what
they're supposed to be used for.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
ode" can be
solved. Calling the other thing "Property Value Alias" would solve the
problem, but it really ought to be defined somewhere; and since it's not
mentioned in the standard, then it's status must be informative, and
that should be indicated.
Peter
Peter Constabl
a.
It's not a big issue, but I don't understand why the dates don't match:
was "Arab" added on January 9 or May 1? So, they're not entirely
consistent.
Also, it appears you have not fixed a serious error in the plain-text
file: it is not well-structured. Some rows have
didn't ask you to do
anything yesterday; I just ask that it be done carefully. And not to
think that bad data files can be relegated to "cosmetics", which is what
you seemed to be saying.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
name that is
structured in a way that allows it to be used in higher-level identifier
protocols, but in the context of ISO 15924, I would not call it the
"ID".)
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
> Even with a separate Phoenician script, it might be a good idea
> to provide variation sequences
Hmmm, gives me an idea: For those people that want to unify, would it
help if all of the Phoenician characters were considered as variation
sequences of Hebrew characters, but for convenience we used
courage you to maintain *one*
master source from which all others are derived.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
> One person wrote, regarding Qaak for Klingon:
>
> > It's a shame you didn't pick something that could be pronounced in
> > tlhIngan Hol, perhaps Qaap for pIqaD.
Identifiers are identifiers, not words.
Peter
Peter Constable
Globalization Infrastructure and
not sure how well this works, but at least the
name can, I think, be improved upon: given that definition, perhaps “analytic
syllabary” (as opposed to a “fusional” syllabary) would be a
better label.
Peter Constable
my point: there *is* a
predetermined representation for numbers (not the one in my example),
and any cultural formatting is done on the local system.
Peter Constable
ated, "locale" and "language" are conceptually
two different things. As for participating in the discussion, I am not
trying to keep anyone out.
> a very common behaviour of the computer people here in Europa, and a
> behaviour I am very angry against (hence the sarcar
n is that
> there are probably sub-varieties of boustrophedon
Sure: in the alternate lines, is the orientation of equivalent
characters the same? mirrored? rotated? other?
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
f course when we consider only the legal texts where all months shall
be
> in
> full letters, all quantities spelled twice, one with numbers and the
> other
> with letters...
I can only say this quite misconstrues anything I have said.
Peter Constable
> I fully agree. "Featural" is a description orthogonal to
considerations
> like "alphabet" or "syllabary" or "printed in green ink" for that
> matter. I was just running off with talking about other orthographies
> which could be described as featural, whatever else the are (note that
> VS and L
> -Original Message-
(B> From: Addison Phillips [wM] [mailto:[EMAIL PROTECTED]
(B> Sent: Thursday, May 13, 2004 10:16 AM
(B
(B[snip]
(B
(B> > -Original Message-
(B> > From: [EMAIL PROTECTED]
(B> > [mailto:[EMAIL PROTECTED] Behalf Of Peter Cons
> Peter Constable wrote:
> >I was already after the first paragraph going to mention another
writing
> >system, and I'm even more strongly reminded of it by this second
> >paragraph: Sign Writing...
> And there's also Visible Speech, by Alexander Melville B
> You speak as if date or number formats had nothing to do with language. I
(B> very
(B> much disagree. If I have message that says: "The date of the last version
(B> of
(B> this document was 2003$BG/(J3$B7n(J20$BF|(J", nobody in their right mind would
(B> say
(B> that that is
(B> cor
rectangle (whatever its orientation), Ken is saying that
that's the level at which Unicode needs to address directionality
issues; anything else is the problem of higher-level processes or
protocols.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
at to guess what the
correct interpretation should be. But I'm not sure I'd want to build a
system for processing business transactions on such assumptions.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
uot; tag in the translation memory can be used to set
the processing mode ("locale") of the software. More often than not,
though, I expect that what would be happening is that the "language"
element of the locale is being determined, and then corresponding
content is being r
ttributes where a language
tag is specified. And that it's not helpful in getting people to
understand what is or isn't good to do for someone providing some degree
of leadership in the area to use the terms "language" and "locale"
interchangeably.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
; Perhaps a term could be devised that encompasses block layout (rather
than
> linear layout) scripts such as Hangul and small Khitan (and even
Chinese zither
> notation ?).
And I assume you mean, not the Han ideographs, yes? Would probably be
useful.
Peter
Peter Constable
Globalizatio
going on internal to proprietary software, then there
are no rules. This is only about public interchange.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
angul. It just requires an additional category of like
"alphasyllabary", which Peter Daniels simply refuses to accept.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
or determine that mode.)
> I fully agree that under the latter interpretation, it is very
> important to distinguish between a language ID and a locale ID.
I am glad we at least agree on that :-)
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
rejects the notion of alphasyllabary,
apparently because he is unable to step beyond the taxonomy devised by
his teacher, Gelb.)
I don't think I would have applied "featural" in the case of Ethiopic or
CAS, though. If they are featural, then so is every abugida.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
in
SIL's PUA usage, but have not yet had opportunity to do so. The
n-descender was not among the thing that were added to SIL's PUA usage,
though.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
de of software processes, and a "locale"
ID is used in APIs to set or determine that mode.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
k against him in that way.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
a default collation would do what they want whether it interleaved
Phoenician and Hebrew or not.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
(Check your facts first, PK :-).
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
sonable.
Other bullets have a breaking class of AL, so that seems appropriate for the Thai fongman.
I have no info regarding the Khmer counterpart.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
Of course, if ever there was a subject line that permitted the topic to
wander howsoever far from where it started, the one on this thread is
it. :-)
Peter
I think one's track record in making judgments on boundary cases is
established only after having successfully dealt with boundary cases --
and enough to establish a level of confidence. Of things already in
Unicode, what have been boundary cases between unificiation and
de-unification?
The unifie
the same sort
weights, and since many searching algorithms use these weights to
determine equality, corresponding characters would match.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
this: it was U+7FFF.
Peter Constable
nly four bytes are needed. There are non-UTF-8s --
beasts that kind of look like UTF-8 but aren't -- in which sequences of
varying length represent the same character and sequences of more than
four bytes appear, but they are not UTF-8; those byte sequences are
considered illegal in UTF-8.
Peter Constable
he latter, it's just making
saying so to make sure you get what you intended and didn't just
mis-type a value.
Peter Constable
> [PA] True. Just stating it is a common practice. People will not be
> unsettled by a plain text unification.
*Some* people. We've already heard from some who will be unsettled.
Peter Constable
ities of users -- how to encode that data. All it can do is
include characters in an encoding standard, giving people options for
how to encode that data. I'm sure eventually there will be someone
somewhere who will encode it using Hangul characters, but that's their
choice.
Peter Constable
multiplied a hundredfold.
The same could be said of Devanagari or Arabic text published in Roman transcription.
That does not mean that we do not encode Devanagari or Arabic, or that encoding those
scripts prevents the same people from continuing to publish in Roman transcription.
Peter Constable
nt won't show up is
> insufficient justification, especially when the repercussions in the
> scholarly communities who actually use this stuff could be disruptive.
Have you not heard that yours is not the only scholarly community? To
speak as though there is only one, or that all have the same needs as
yours, seems a bit arrogant.
> >I don't think anybody is looking for that many distinctions to be
made.
>
> I certainly hope not.
Then I hope we can all agree not to revisit that red herring again.
Peter Constable
our concerns regarding IE. I do not work on the product,
but will try to relay them to the team that works on it. Their
priorities *have* been elsewhere.
Peter Constable
p or using a scripting language that can
talk to the .Net framework and you know the framework will be installed
on the target system, you can use the System.Text.UTF8Encoding class.
Peter Constable
he character encoding level, not in markup. I have no
opinion on what or how many the new distinct things should be.
> * Separately encode Phoenician, Old Hebrew, Samaritan, Archaic Greek,
Old
> Aramaic, Official Aramaic, Hatran, Nisan, Armazic, Elymaic, Palmyrene,
> Mandaic, Jewish Aramaic, Nab
> OK, maybe not such a good example. So let's go back to Suetterlin.
Haven't we had enough of these thought experiments?
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
sure nobody is going to come along later and
say, "We've discovered we need to distinguish two orderings for qamats
qatan and athnah" (or tipha, tevir, munah, mahapakh, merkha, merkha
kefula, darga or yerah ben yomo).
(Of course, if they do, they can always insert CGJ.)
Peter
P
contained explicit references to other planes of
> Unicode-ISO/IEC 10646.
But no characters.
> and I can't understnad why the largest software company...
I have no intention in attempting to make you understand.
Peter Constable
sense?
Yes, I understand. It's not a big deal either way, IMO.
Peter Constable
r what, and
how many historic written variations this set should encompass, I don't
know.
Peter Constable
qamats >; it does not allow you to distinguish
differently-ordered sequences of qamats qatan and any other combining
marks with a non-zero class.
Peter Constable
-plane characters in that product -- all the infrastructure
needed is there.
Peter Constable
> The formatted look good Peter but how many users will be able to
format and
> bump the size?
I think lots of users know how to format text. And the increase in size
wasn't necessary to provide correct rendering; it just made for a
clearer screen shot.
Peter
Peter Constable
Gl
> It looks better on Mac than Windows OS.
It can look perfectly good on either system.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
cause the
>
accent occupy two cursor space. I still think with all these observations
>
something must be done.
In actual practice, none of these are
problems. I can copy and paste portions of your text with no problem: áÌÃÃ.
Peter
Â
Peter Constable
Globa
category (Mn), it is implicitly already allowed in
combining sequences.
The term "COMBINING" in the name is more than a bit of a clue...
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
> > I'm not really all that interested in the justifications per se.
> Proper justification of a proposal is always important...
Don't worry; what Michael is expressing here is his personal opinion and
interests, not the policy of either UTC or WG2.
Peter
Peter Cons
e just as unsupported as
a custom encoding would be.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
recommend that you look
into the resources at http://www.microsoft.com/typography/creators.htm, and the
OpenType discussion list.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
or Canadian Syllabics or Ethiopic if this
> helps
> processing the corresponding languages.
IMO, the only people it would help are people that can earn their living
developing such new standards.
Peter Constable
cation in both forms
is such an argument.
Also, I think one could easily conclude from the samples that, if
Canaanite/Phoenician is unified with something, it should be unified
with Samaritan script rather than the square Hebrew script.
Peter Constable
here isn't anything available that does.
Peter Constable
< qamats qatan > are). This is probably more useful.
I would probably leave the value at 220. That is what all of the Hebrew
vowel points should have been, IMO. Though getting one right doesn't
make a huge difference -- people are still going to be using CGJ to
preserve particular sequences in the cases this will most likely be
needed.
Peter Constable
Item 2 is probably true. But
is it enough to refer to square Hebrew as "the modern form" of
Phoenician (Old Canaanite, whatever you want to call it)?
Peter Constable
> So how come the majority of Polish people living abroad - let's say
> 40 millions against 40 million living in Poland - is not able of
> using their native characters - also called 'ogonki' - in their
e-mails?
I'm not aware of any reason why they cannot.
Peter Constable
> "The existing composites were included only out of necessity so that
new
> Unicode implementations could interoperate with existing
implementations
> using legacy industry-standard encodings." - Peter Constable
>
> Are we saying we have exhausted such necessity?
Y
tions
using legacy industry-standard encodings. Apart from the backward
compatibility issue, these composites go against Unicode's design
principles and are not needed.
No new composite values will be added.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
t is
> of no particular utility.
It provides improvement for very rare possibilities, which is indeed
marginal and only a minor drop in the larger bucket.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
ritan being the other.
Ah, so the next protracted debate is going to be whether Samaritan
should also be encoded using the existing square Hebrew characters.
Since it would appear that the argument for unification of PH with
Hebrew could also argue for unification of PH with Samaritan, or of all
three.
Peter Constable
the examples, I certainly agree that a superset
does not imply a distinct script.
Peter Constable
What are the directional properties of Pheonician? Is it RTL only, or
was it ever written with a different directionality?
Peter Constable
is no obvious
way to add the accents, but even if there were, I suspect those same
people still wouldn't recognize it as accented Hebrew with archaic
glyphs.
So, while Michael's argument was flawed in the way he expressed it, I
think your counter-argument also is flawed.
Peter Constable
t readily interpret text in their language when
written with a written variety (and distinct-script candidate) B, then B
is distinct from A. It *is*, IMO, a valid consideration, but it alone
isn't a sufficient criterion. Note, for instance, that one could apply
that argument to try to justify a Latin cipher.
Peter Constable
ng is not in using MSKLC
but rather in not knowing about OpenType. I recommend the resources
available at http://www.microsoft.com/typography/creators.htm.
Peter Constable
an handle that, even if it is
represented in the encoding as a sequence < g, b >
This is no different from e.g. the "ch" and "ll" digraphs in Spanish.
The Spanish alphabet has A B C CH D ... K L LL M ... Software is able
to support this even though the CH and LL are encoded as sequences, < c,
h > and < l, l >.
Peter Constable
Unicode is *designed* to use combining sequences in this way. The fact
that you see any precomposed combinations, even the e-dot, does not imply a
precedent as these are the exceptions.
Peter Constable
<>
sage *as encoded information* is pertinent
here.
If you don't have info about usage in that sense, just say so. It will
help us know where things stand.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
unification of encoding. That isn't an opinion we
can debate, that is an undisputed fact: it *is* what a significant group
of users are doing with their data, as some of the representatives of
that group have told us (and there's no reason to believe they're
lying).
Peter Constable
a typeface distinction, not a script distinction.
Peter Constable
atin -- so we just need to encode the
turned capitals? Or is there more to it I'm not thinking about?
Peter Constable
ncern, discussing it here does nothing. You need
to go to http://www.unicode.org/reporting.html and submit your comments.
> I posted my comment to the UTC administrative report form.
Great. Thanks very much for doing that.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
structurally the same. Thus, I'm not
sure the scripts of India are the best comparison in this case.
Unless there are behaviours in Phoenician that distinguish it from
Hebrew.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
y
assumptions.)
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
es to make the distinction, so that "zh" would
> become an
> identifier for a family of Han-written languages, rather than a
language
> identifier, and so a legacy code.
In ISO 639-3, zh will be considered a macro-language identifier. But zhs
and zht would not be good ideas, and will not be considered for ISO 639
or for RFC 3066.
Peter Constable
e issues is found in
> supplementary libraries in ICU which support locale aliases. (Yes I
use
> the
> terme Locale because this is the term that Java gives to this
> identification,
NO. That is the term Java (and other things) give to a *different*
identification. There are languages, there are cultures/locales. The two
are not the same.
Peter Constable
same
> combining class as the character it applies to, so that the two will
> always remain together. Thus there would potentially be the need for a
> considerable set of VSs. But I don't think this is really necessary.
I think that would be better than having general VSs used with combining
marks.
Peter Constable
> A time may come when they decide they
> want their own language, Walloon. At that time they will no doubt ask
> for appropriate ISO etc codes.
There's nothing futuristic about that: "wln"
(http://www.loc.gov/standards/iso639-2/englangn.html#uvwxyz)
Peter Constable
> *anywhere* that
> says they won't.
I'm working on it. The ISO 639/RA-JAC has acknowledged the need for
stability. Getting into the normative text of the standards takes a
little time.
Peter Constable
oftware
> implementations, because it avoids some caveats that come from other
unstable
> standards such as ISO 3166 and ISO 639.
ISO 639 is not unstable. It is an open code set that is being added to
over time, but I don't think that should be referred to as unstable --
that term suggests oth
or *language*
identification, not *locale* identification.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
ou can honestly say
that OpenI18N isn't tied to a particular family of platforms. Or, at
least, I can say that when I last looked at the OpenI18N site, it sure
looked like it was tied to a particular family of platforms.
Peter Constable
s that encoding bridge when we come to it.
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
ed it to that small but perhaps
growing collection:
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=UnicodeC
haracterStories
Peter Constable
301 - 400 of 547 matches
Mail list logo