Egyptian Demotic

2014-01-22 Thread Stephan Stiller
Hi all, Is Egyptian Demotic on somebody's roadmap for Unicode? (Egyptian Demotic is what's on the middle third of the Rosetta Stone.) Stephan ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode

Re: Representation of neutral tone in pinyin and bopomofo

2013-11-23 Thread Stephan Stiller
[CR:] in now exotic styles where the letters /ĉ, ŝ, ẑ, ŋ/ were used as well Interesting. ẑ, ĉ, ŝ (but not ŋ) have been part of most pinyin descriptions at the end of dictionaries; ẑ, ĉ, ŝ are still listed in Xiàndài Hànyǔ Cídiǎn's 6th edition. But de-facto noone uses them, and I'd regard them

Re: Representation of neutral tone in pinyin and bopomofo

2013-11-22 Thread Stephan Stiller
Hi Eric, [We met at the UTC meeting.] I. Is it correct that: in bopomofo, the neutral (or light) tone is represented by U+02D9 ˙ DOT ABOVE, and in the text representation, that character follows the bopomofo characters of the syllable (just like all the other characters for tones) 1. Give

Re: letters that "complete the rectangle" in Indic scripts

2013-09-20 Thread Stephan Stiller
On 9/19/2013 2:35 AM, Stephan Stiller wrote: As far as I am aware, a proper 'null consonant' has only arisen when it actually represents a glottal stop. There's ㅇ in hangeul ("Hangul"; Korean). Hebrew ע was supposedly first pharyngeal [ʕ], though it's nowadays sta

Re: letters that "complete the rectangle" in Indic scripts

2013-09-19 Thread Stephan Stiller
As far as I am aware, a proper 'null consonant' has only arisen when it actually represents a glottal stop. There's ㅇ in hangeul ("Hangul"; Korean). Hebrew ע was supposedly first pharyngeal [ʕ], though it's nowadays standardly a glottal stop [ʔ] or null ∅ (and you don't even need need a hiatus

Re: Code point vs. scalar value

2013-09-18 Thread Stephan Stiller
Instead of selectively agreeing with Philippe's writing, it would be good to tell us why Glossary claims that surrogate code points are "[r]eserved for use by UTF-16" and why there are similar statements in the Unicode book if [AF:] [o]nce you add the UTF-prefix, you are, by force, speaking of

Re: Code point vs. scalar value

2013-09-18 Thread Stephan Stiller
On 9/18/2013 2:42 AM, Philippe Verdy wrote: There are "scalar values" used in so many other unrelated domains [...] There is no risk for confusion with vectors or complex numbers or reals or whatnot. On 9/18/2013 8:34 AM, Asmus Freytag wrote: I concur. Codepoint is the accepted way of referrin

Re: Code point vs. scalar value

2013-09-18 Thread Stephan Stiller
On 9/18/2013 12:02 AM, Stephan Stiller wrote: That still doesn't mean surrogates are "used by UTF-16" => 'That still doesn't mean surrogate_code point_s are "used by UTF-16"'

Re: Code point vs. scalar value

2013-09-18 Thread Stephan Stiller
On 9/17/2013 10:54 PM, Asmus Freytag wrote: On 9/17/2013 8:40 PM, Philippe Verdy wrote: In what way does UTF-16 "use" surrogate code /points/? An encoding form is a mapping. Let's look at this mapping: * One _inputs_ scalar values (not surrogate code points). In fact the input i

Re: Code point vs. scalar value

2013-09-17 Thread Stephan Stiller
In what way does UTF-16 "use" surrogate code /points/? An encoding form is a mapping. Let's look at this mapping: * One _inputs_ scalar values (not surrogate code points). * The encoding form will _output_ a short sequence of encoding form–specific code units. (Various voices on this list h

Re: letters that "complete the rectangle" in Indic scripts

2013-09-17 Thread Stephan Stiller
I have been told that Devanagari contains letters (or a letter) that were invented merely to complete the rectangular C-V table; not sure to what extent they (or it) were used subsequently. In which reference is this mentioned? I was referring to oral communication (above I wrote "I have been to

Re: Code point vs. scalar value

2013-09-17 Thread Stephan Stiller
On 9/17/2013 5:27 PM, Asmus Freytag wrote: On 9/17/2013 2:55 PM, Stephan Stiller wrote: [AF:] It is the wording in your posts that adds to the confusion. My fundamental point is, has been, and continues to be that whenever people use the more general word "code point" instead o

letters that "complete the rectangle" in Indic scripts

2013-09-17 Thread Stephan Stiller
I have been told that Devanagari contains letters (or a letter) that were invented merely to complete the rectangular C-V table; not sure to what extent they (or it) were used subsequently. Wiki http://en.wikipedia.org/wiki/Devanagari tells me about the letter ॡ (signifying "ḹ", I assume th

Re: Code point vs. scalar value

2013-09-17 Thread Stephan Stiller
[AF:] It is the wording in your posts that adds to the confusion. My fundamental point is, has been, and continues to be that whenever people use the more general word "code point" instead of the more appropriate "scalar value", that will "add to the confusion". If you make the presupposition

Re: Origin of Ellipsis

2013-09-16 Thread Stephan Stiller
On 9/16/2013 7:48 AM, Stephan Stiller wrote: or count code points corresponding to code units because, well, you can match them up = "or count code points corresponding to UTF-16 code units"; those happen to be BMP code points. Twitter has been claiming since /at least/ April

Re: Origin of Ellipsis

2013-09-16 Thread Stephan Stiller
units because, well, you can match them up. The latter interpretation seemed to derive from terminological imprecision at first, but my concern and suspicion turned out to be spot-on with what Twitter did historically. On 9/16/2013 7:19 AM, Philippe Verdy wrote: 2013/9/16 Stephan Stiller <

Re: Origin of Ellipsis

2013-09-16 Thread Stephan Stiller
① Twitter - [...] ② Sina Weibo - [...] About a year ago I blogged about it http://schappo.blogspot.co.uk/2012/10/weibo-character-count.html And your post on Twitter is this one: http://schappo.blogspot.co.uk/2012/10/twitter-character-count.html Stephan

Re: Origin of Ellipsis

2013-09-16 Thread Stephan Stiller
Twitter - Until recently, characters outside the BMP resulted in a Counter decrement of 2 and BMP characters gave a decrement of 1. Not sure when the change happened but now both BMP & non BMP characters result in a decrement of 1 Yes!! How might that have happened? ;-) And the date line of Tw

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-15 Thread Stephan Stiller
Doug wrote me: You're not confusing "code point" with "code unit," are you? Thanks for the note. I think what you say is that I thought (or meant to write) "by first representing the sequence of scalar values in an encoding form and then counting [code points typecast from] code _units_". I t

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-15 Thread Stephan Stiller
Stephan Stiller wrote: From the link it isn't entirely clear whether they (a) count scalar values of NFC or (b) count code points of NFC. Are they not the same thing, except for surrogates? Conceptually no, but numerically yes – you are right in that regard, and I wasn't pre

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-15 Thread Stephan Stiller
On 9/15/2013 3:07 PM, Phillips, Addison wrote: Not if the limit is counted in characters and not in bytes. Twitter, for example, counts code points in the NFC representation of a tweet. "character", "code point" – these are confusing words :-) From the link it isn't entirely clear whether they

Re: Origin of Ellipsis

2013-09-15 Thread Stephan Stiller
On 9/15/2013 1:04 PM, Doug Ewell wrote: André Schappo wrote: U+2026 is useful for microblogs when one is looking to save characters Not if the microblog is in UTF-8, as almost all are. That's an astute observation, but André was talking about input limits https://dev.twitter.com/docs/coun

Re: Origin of Ellipsis and double spacing after a sentence.

2013-09-15 Thread Stephan Stiller
On 9/14/2013 6:24 AM, Michael Everson wrote: It facilitates comment by those who are reviewing the text. If you add proofreaders' marks to an especially difficult manuscript, maybe. I've barely seen annotated papers with comments that would not have fit into the margins, and there's still the b

Re: Origin of Ellipsis

2013-09-14 Thread Stephan Stiller
On 9/14/2013 3:42 AM, Michael Everson wrote: On 14 Sep 2013, at 02:30, Stephan Stiller wrote: This means that this dot will then need to be followed by two spaces when it is used as a sentence-ending period. This tradition is no longer current in the US. Though it's obvious there are

Re: Origin of Ellipsis

2013-09-14 Thread Stephan Stiller
[ME:] Books never used it. The tradition in typing was developed to assist typesetters to navigate the typewritten text they were setting. The typesetters never put two spaces after a full stop. I'm looking at what looks like a US edition/printing (1902) of the US-American novel Moby-Dick:

Re: Origin of Ellipsis

2013-09-14 Thread Stephan Stiller
You've quoted the sentence out of its context (note the "then" word which indicates this context). I do not support this practice. Philippe, "within my message you quote here" isn't exactly precise about context, is it :-) I think there's a misunderstanding. My annoyance isn't in principle wi

Re: Origin of Ellipsis

2013-09-13 Thread Stephan Stiller
This tradition is persistant. Persistent where? This is already replied within my message you quote here. Lots of people Lots of people who Same remark. So there are "many" contributors, on the English Wikipedia. What does "many" mean? I doubt double spacing of sentences is

Re: Origin of Ellipsis

2013-09-13 Thread Stephan Stiller
This tradition is persistant. Persistent where? Lots of people Lots of people who and how many? Go to a bookstore or library, pick 100 items randomly, and report. If you want to make a case that it's majority or significant usage in personal correspondence or outside of professional printing

Re: Origin of Ellipsis

2013-09-13 Thread Stephan Stiller
:-) Lots of people still do this. I did until a year or two ago. I also use non-standard punctuation, but I tend to know what majority practice is, and when I deviate it's intentional. I don't know about you, but nearly everyone who tells me that "you should use two spaces" ("should"? says who

Re: Origin of Ellipsis

2013-09-13 Thread Stephan Stiller
Hi Philippe, This means that this dot will then need to be followed by two spaces when it is used as a sentence-ending period. This tradition is no longer current in the US. Though it's obvious there are still plenty of middle and high school–level teachers and college-level writing instructor

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-13 Thread Stephan Stiller
Exactly my thoughts: In fonts commonly used for word processing and desktop publishing, HORIZONTAL ELLIPSIS is usually not that well designed. To me the dots appear too close in plenty of fonts. But I think that the most common cause of the appearance of HORIZONTAL ELLIPSIS is that Microsoft O

Re: Empty set

2013-09-13 Thread Stephan Stiller
[PV:] But then the existing ellipsis is not a good candidate because it has the incorrect metrics where it should use the sinographic metrics. [...] But the encoded ELLIPSIS does not fit correctly there. But I think Chinese fonts take care of that. Stephan

Re: Empty set

2013-09-13 Thread Stephan Stiller
Once you've increased the width of these interword spaces to their maximum, all the characters (and these increased spaces) should be justified using interletter spacing, and this extra interletter spacing should be applied as well between the dots of the el

Re: Empty set

2013-09-13 Thread Stephan Stiller
Once you've increased the width of these interword spaces to their maximum, all the characters (and these increased spaces) should be justified using interletter spacing, and this extra interletter spacing should be applied as well between the dots of the ellipsis (showing that they are effecti

Re: Empty set

2013-09-13 Thread Stephan Stiller
I dd not speak about inter-word spacing (this cont affect the rendering of ellipsis itself) but about inter-letter spacing. But the context I provided was that some people ask for ". . .[ .]", as ugly as it is :-) And, again, the precise "ideal" spacing is a matter of typographic design; you c

Re: Empty set

2013-09-13 Thread Stephan Stiller
I've never seen it in math proper, is what I meant, but ... The { [ ( ) ] } hierarchy is used in chemical nomenclature. It is specified by IUPAC (International Union of Pure and Applied Chemistry). For example: acetone (/R/)-/O/-{2-[4-(α,α,α-trifluoro-/p/-tolyloxy)phenoxy]propionyl}oxime ...

Re: Empty set

2013-09-13 Thread Stephan Stiller
Hi Philippe, i.e. "(...)." at end of a truncated sentence or ". (...)" at start of the next truncated sentence Well, for citations in German I've generally seen "[...]", and for English I've seen both "[...]" and "...", but not "(...)". I included it them in my sentence ("paren

Re: Empty set

2013-09-12 Thread Stephan Stiller
Hi Philippe, I disagree. For me your "spaced-out ellipsis" (". . .") is not an ellipsis but are horizontal rulers (typically used in tables or input forms) to facilitate the reading of tabular data. I disagree with CMOS prescription in this case, just as you do, but the prescription exists, na

Re: Empty set

2013-09-12 Thread Stephan Stiller
The situation with {} is very similar to the situation with 0̸ for the empty set and with \ for set subtraction. The Knuth's version of TeX was designed for typesetting his books, and he (probably) did not encounter situations where the meaning of these symbols is ambiguous. When AMS was design

Re: Empty set

2013-09-12 Thread Stephan Stiller
The notation { } is quite correct. It just isn’t an atomic symbol for the empty set but an expression consisting of the two characters “{” and “}”, with a list (here, an empty list) of elements between them. Reminds me of typographically composite stuff that has its own scalar value ("code point

Re: Why blackletter letters?

2013-09-12 Thread Stephan Stiller
I confess I usually type a Danish Ø for convenience when I'm using this, though for publication I would tend to substitute the proper ∅. Whenever I saw the empty set symbol in printed math literature in Germany, it closely resembled Ø; I don't think I ever saw a stru

Re: Empty set

2013-09-12 Thread Stephan Stiller
Regarding the empty set, the page http://jeff560.tripod.com/set.html rather convincingly attributes the symbol to André Weil, who says that it was inspired by the Norwegian letter “Ø”. Well, if one looks at earlier editions of the "Éléments", the symbol is clearly not printed as circ

Re: Why blackletter letters?

2013-09-12 Thread Stephan Stiller
Talking about which ... I confess I usually type a Danish Ø for convenience when I'm using this, though for publication I would tend to substitute the proper ∅. Whenever I saw the empty set symbol in printed math literature in Germany, it closely resembled Ø; I don't think I ever saw a struck-

Re: Why blackletter letters?

2013-09-11 Thread Stephan Stiller
On 9/11/2013 5:56 AM, Gerrit Ansmann wrote: That’s correct, but that did not seem to stop people from using a long s in Antiqua from time to time. There are a lot of post-1901 Antiqua display fonts that contain a long s as well as examples from normal text. This very rarely happens even today:

Re: Why blackletter letters?

2013-09-11 Thread Stephan Stiller
Hi Gerrit, I have been aiming at creating a blackletter font (http://unifraktur.sourceforge.net/maguntia.html) Cool! • The four “required” ligatures ch, ck, ſt and tz, which were never separated in typesetting. These can be realised in the very same way as antiqua ligatures. Your page draws

Re: Can a single text document use multiple character encodings?

2013-08-28 Thread Stephan Stiller
To appease the nit pickers: I totally didn't know there's nitpickers on this list, like, those that reply to and pick on each other. Interesting!

Re: Can a single text document use multiple character encodings?

2013-08-28 Thread Stephan Stiller
On 8/28/2013 3:35 PM, Asmus Freytag wrote: The original question was about combining UTF-8 and UTF-16 in the same document. /Not quite./ Hint: The original question is in the original email.

Re: What to backup after corruption of code units?

2013-08-28 Thread Stephan Stiller
What I meant to write, with corrections: And if the meaning "to go back [in a string]" is established [...] "back up" seems to me the one expression that people dealing in code point conversion and string access would use in this context. then "to back up" in that particular meani

Re: What to backup after corruption of code units?

2013-08-28 Thread Stephan Stiller
confusion isn't exactly rampant I guess so. But while we're splitting hairs: There simply are two meanings for the word "backup", which in and of itself is nothing unusual, especially where one of them is the ordinary sense of the term (not really a technical term). In the IT domain, the "to s

Re: Can a single text document use multiple character encodings?

2013-08-28 Thread Stephan Stiller
For Web formats (HTML, etc.), the answer is "no". The obvious follow-up to the list: It'd be interesting to know where the answer is "yes". People will occasionally mention ISO/IEC 2022, which can be thought of as a meta-encoding or encoding template or encoding constructor, but in the normal

Re: What to backup after corruption of code units?

2013-08-27 Thread Stephan Stiller
All good replies It means the program needs to go back (a.k.a. "back up") but I'd say "backtracking" would make for better wording in TUS. Stephan

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-05 Thread Stephan Stiller
On 8/5/2013 11:26 AM, Whistler, Ken wrote: Inclusion of the precomposed characters now seen in the U+1FXX block was part of the price of the merger. What was included was precisely the repertoire requested by Greece, and no attempt was made to further rationalize forms including macrons for An

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-04 Thread Stephan Stiller
Please bear in mind that polytonic vowels ARE used in the language called Modern Greek. /Because/ of the Ancient/Attic heritage living on via Katharevousa or the occasional person persisting in polytonic orthography. In any case, modern writing has traditionally not used macrons (and certain n

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-04 Thread Stephan Stiller
Most of the polytonic precomposed vowels are in the auxiliary exemplars for Modern Greek. I don't know – probably because of the Katharevousa legacy and the fact that Ancient Greek lives on in literary idioms, for which you ordinarily don't use a macron for reasons of orthographic convention.

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-04 Thread Stephan Stiller
On 8/4/2013 2:59 PM, Richard Wordingham wrote: The CLDR does not yet support Ancient Greek! [...] Vowels with plain COMBINING BREVE and COMBINING MACRON don't make to the list of auxiliary exemplar characters for Modern Greek. This is a non-sequitur; why would they for Modern Greek (Dimotiki),

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-04 Thread Stephan Stiller
[from RW:] /For metrical purposes/, we don't know whether the syllable is open or closed until we know what comes next. [emphasis added] About that you are right, and it was an oversight on both our parts. But the dictionary also contains πράσσω with ᾱ in an annotation, and the weight of the

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-03 Thread Stephan Stiller
[0)] Where there is no diacritic on the vowel, then macrons are used for alpha, iota and upsilon in the headwords. 1) Vowel lengths are not shown where one is expected to know it, e.g. in the prefixes of verbs 2) Vowel lengths are not shown in closed syllables - strictly speaking, breve and

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-03 Thread Stephan Stiller
/[One consequence of the string policy is that ]/we can no longer encode new precomposed characters for grapheme clusters that are already encoded in any existing standard form/[.]/ And you've truncated the end of my sentence Well, I have not, unless you really want to count that l

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-03 Thread Stephan Stiller
I've seen information concerning this we can no longer encode new precomposed characters for grapheme clusters that are already encoded in any existing standard form many times, though I'm not in a position to verify all of your content. I'm also not proposing to add precomposed {ᾱ,ῑ,ῡ}-with-dia

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-02 Thread Stephan Stiller
The practice in Scott and Liddell is to reserve ᾱ, ῑ and ῡ for a note after the dictionary entry. I'm looking at Liddell-Scott-/Jones/ here and at old pdf's of Liddell & Scott [only] by Google, and I cannot easily confirm your statement. Perhaps it holds for

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-02 Thread Stephan Stiller
Characters restricted to dictionaries are generally not well supported. And modern textbooks in a modern world :-) The practice in Scott and Liddell is to reserve ᾱ, ῑ and ῡ for a note after the dictionary entry. Liddell & Scott is old, just like Lewis & Short. We've moved on since then, and

polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-02 Thread Stephan Stiller
Hi, If one wants to indicate vowel length for the length-ambiguous vowels α, ι, υ in Ancient Greek, one writes ᾱ, ῑ, ῡ. Is there a reason for why there are no diacritic-precomposed characters? I guess it's because macron usage is rare in orthographic practice, even though vowel length here is

Re: symbols/codepoints for necessity and possibility in modal logic

2013-08-02 Thread Stephan Stiller
There are a number of "box" characters in the vicinity of U+27FB You mean U+25FB. U+25A1 [I think: maybe] [and] For diamond [...] U+25CA [I think: no] Have you read my previous discussion and looked at UTR 25 (p. 20 and also "Ideal Sizes" on p. 19)? U+25AB [and] U+25FD Definitely too

Re: _Unicode_code_page_and_?.net

2013-07-30 Thread Stephan Stiller
On 7/30/2013 3:27 PM, Asmus Freytag wrote: architectures that depended on swapping character sets (code pages) in mid stream I thought systems were usually married to a particular code page. I'm wondering where (historically) you'd actually change to a different code page mid-stream. Steph

Re: Unicode code page and ☃.net

2013-07-29 Thread Stephan Stiller
I have a question regarding the supported Unicode code page. There are no Unicode code pages. I guess there is the question of what exactly a codepage is when you consider complicated encodings, esp stateful ones. But I always think of Unicode as one giant abstract codepage, and Unicode cha

Re: symbols/codepoints for necessity and possibility in modal logic

2013-07-19 Thread Stephan Stiller
Hi Jörg, Thanks for the info! U+25C7 WHITE DIAMOND is the best choice I'm with you in that for now I'll go with ⟨◻ (U+25FB), ◇ (U+25C7)⟩ as the pair of choice, pending further decisions; see also what I'm writing further down. Or objections from experts stating that the symbol properties

Re: symbols/codepoints for necessity and possibility in modal logic

2013-07-19 Thread Stephan Stiller
Why not contact the relevant publishers and find out what they are using? "Why not contact the relevant governments and find out what they're using in order to solve /_*all*_/ encoding issues for /_*all*_/ languages and writing systems within a day?" :-) Publishers use metal type (or various

Re: symbols/codepoints for necessity and possibility in modal logic

2013-07-19 Thread Stephan Stiller
What is wrong with using DIAMOND OPERATOR? "wrong" is strong wording and goes beyond what I suggested or implied, but it's not clear to a user of Unicode that it's the right fit either. There are a couple of indicators factoring in: * The charts mention modal logic in conjunction with ◻ (U+

symbols/codepoints for necessity and possibility in modal logic

2013-07-18 Thread Stephan Stiller
Hi all, Modal logic uses a "box" and a "diamond" (this is how they're informally called) as operators (accepting one formula and returning another) to denote necessity and possibility, resp. Older texts might use the letters L and M (resp). Which Unicode codepoints do modal box and diamond co

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
I suppose you can't go wrong with what your own passport says On second thought ... * disallowed: Ä↛A , Ö↛O , Ü↛U (as are: Å↛A , Ø↛O) ... I have a Turkish friend for whom it is Ö→O, not OE. This calls into question the general applicability of these rules. A few years ago he also told m

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
See http://www.icao.int/publications/Documents/9303_p1_v1_cons_fr.pdf , especially Appendice 8 (p IV-50). The English version is available as http://www.icao.int/publications/Documents/9303_p1_v1_cons_en.pdf , especially Appendix 8 (p IV-47). I suppose you can't go wrong with what your own pa

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
Hey Jonathan, The official transliteration for Hebrew to the Latin script is obsolete What is the latest recommended scheme? and the situation in this country is a mess Let me guess: it has to do with the number of spelling variants in names of /aliyah/ immigrants? I've always been wondering

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
My impression is that US customs officials are either quite knowledgeable or quite tolerant on such issues (or a mixture of both). The same applies to customs officials in other countries I have traveled to, and other people at airports and such. Thanks. (And, I don't have the knowledge to ag

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
Hi Jonathan, I definitely appreciate the partial datapoints from your links, but Google is your friend by itself doesn't lead us closer to a real answer, and in this case I think that there are at least some good answers, and in any case some answers will be better than others. This remind

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
Hi Richard, I know of standards for transcribing foreign alphabets (by /target/ locale – Are they relevant here? If so, which?) [...] This may well depend on both source and target locale! How often will locale have to be broken down on a non-local basis? Different newspapers in the same city

writing in an alphabet with fewer letters: letter replacements

2013-07-04 Thread Stephan Stiller
Hi folks, For languages whose alphabets aren't too far apart (I'm thinking mostly of the set of Latin-derived alphabets), what is a good place for finding out how letter replacements for letters that are missing in a different country/locale are done? For example, how will an Icelander norma

Re: Arabic quoting characters

2013-06-15 Thread Stephan Stiller
It’s somewhat implicit, but still relatively clear: [...] You are right about the information re Arabic round double quotation marks being essentially within TUS; thanks. I also had in mind Roozbeh's statement about Persian and Greek (and I won't check right now), so there remains the questi

Re: Arabic quoting characters

2013-06-14 Thread Stephan Stiller
On Fri, Jun 14, 2013 at 10:45 AM, Michael Fayez mailto:michaelfa...@hotmail.com>> wrote: I noticed that double small parentheses that are used in professional printing in Arabic presses are not encoded in Unicode. [...] So does Unicode Consorti

Re: interaction of Arabic ligatures with vowel marks

2013-06-12 Thread Stephan Stiller
Thank you, خالد and Richard. there is only one Indic mark I can think of for which the issue of component association arises, and that is the nukta That is good to know, given the complexity of the Indic scripts. Other thoughts: * One could simply break up Arabic ligatures in need of harakat

interaction of Arabic ligatures with vowel marks

2013-06-11 Thread Stephan Stiller
Hi, How is the placement of vowel marks around ligatures handled in Arabic text? Does anyone have good pointers on this topic? My guess is that this does not come up often (just like the topic of pointing for handwritten Hebrew), as vowel marks are mostly not added in ordinary text. Nonethele

Re: Hanzi trad-simp folding and z-variants

2013-06-09 Thread Stephan Stiller
The way the Cheung-Bauer list was compiled certainly hard to see how most of the characters would be in widely known. I'd need to look at C&B again for accurate numbers, but to some extent it's simply because some syllable-morphemes are listed with many different attested possibilities. So o

Re: Hanzi trad-simp folding and z-variants

2013-06-09 Thread Stephan Stiller
For me "non-standardized' means there is not one recognized standard, this does not mean that things are completely unstable, nor that there are no traditions of what character is used for what word that have been passed down for many generations. /As I stated/, for a decent number of syllab

Re: Hanzi trad-simp folding and z-variants

2013-06-09 Thread Stephan Stiller
Familiarity with a writing system makes the "non-obvious" parts comprehensible, as can context. The work is a thorough listing of usage instances that the authors could encounter in the wild. My informants can't recall ever having seen many of these characters. They wouldn't use them, and that

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
So we both agree that Unihan is not designed to tell people how to covert between traditional and simplified characters. Yep. Though some confusion as what other questions are being discussed here. I think I misused the expression "folding" at some point. But the original query explicitly as

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
better word choice: lexical variation -> orthographic variation (in my prev email)

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
I. Which and where? Section 3.7.1 Simplified and Traditional Chinese Variants talks about converting between Simplified and Traditional Chinese. You wrote this http://www.unicode.org/reports/tr38/ does a good summary of the possibilities. in response to my inquiry about

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
As far as general folding is concerned, performing conversion (whether it's word-based or not and even if it's locale-tailored) and then a strict search will let you miss out on the z-variation you find in the wild (because of true variation or of misspellings), and a more generous inclusion of

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
The situation also sends to be complex once one steps putside of Putonghua. Given that the situation there is a lack of standardization (and a lack of tables laying out variant spellings), I don't think anything other than radical, hand-tuned folding to cover all possibilities is sensible to

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
http://www.unicode.org/reports/tr38/ does a good summary of the possibilities. Which and where? Trying to "fold" from one locale to another, which is what folding from traditional to simplified would be is not a good idea, best practice is not bear in mind the locale being used, and do infor

Re: Hanzi trad-simp folding and z-variants

2013-06-07 Thread Stephan Stiller
simplified [is] better thought of as abbreviated Part of this is a terminological argument. The historical situation is indeed more complicated than many people know, but the truth is also that irrespective of eg people's past or present usage in handwriting there have (in the past and esp in

Re: Hanzi trad-simp folding and z-variants

2013-06-07 Thread Stephan Stiller
Hi John, This is one of those questions that I've been wondering about as well ... my guess would be "yes that should work (and dealing with z-variants is something you'll likely need to do anyways)", but there *must* be some published algorithm out there that specifically addresses the issue

Re: Suggestion for new dingbats/symbols

2013-05-30 Thread Stephan Stiller
Excellent question and points from Albrecht Dreiheller. [AD:] So the _receptive vocabulary_ might be pretty big for many people. [...] So the _productive vocabulary_ of symbols will always be very, very small. I was thinking a similar thing, and I'm inclined to agree. But I know of paralle

Re: Suggestion for new dingbats/symbols

2013-05-28 Thread Stephan Stiller
The Noun Project seem determined to create a pictogram for every noun, and many short phrases: See http://blog.thenounproject.com/ Huh. What are the constraints on the symbols; eg: what resolution can the symbols be (so that we don't simply use detailed high-res pictures)? Are there any o

Re: SignWriting

2013-04-22 Thread Stephan Stiller
what the western world knows as „calligraphie“, e.g., in Germany elementary school kids become graded for the prettiness of their handwriting. I've only ever encountered the word "Kalligraphie" (now preferred: "Kalligrafie") in the meaning of "artistic writing" in Germany. If the word is also

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread Stephan Stiller
[Charlie Ruland:] The Unicode Consortium is prepared to encode all characters that can be shown to be in actual use. Are you sure there is a precedent for what is essentially markup for a system of (alpha)numerical IDs? Stephan

Re: SignWriting

2013-04-22 Thread Stephan Stiller
> Sing-Writing has both a normative form, to be generated by computer >> programs, and a handwriting form allowing more freedom. It has been >> developed using signs that are not so complicate to reproduce in a >> meaningful way. >> >> Could you provide a link with signwritten sentences in the *lat

Re: SignWriting

2013-04-21 Thread Stephan Stiller
Sing-Writung has both a normative form, to be generated by computer programs, and a handwriting form allowing more freedom. It has been developed using signs that are not so complicate to reproduce in a meaningful way. Could you provide a link with signwritten sentences in the /latest/ versio

Re: SignWriting

2013-04-21 Thread Stephan Stiller
SignWriting is also difficult to write. Not necessarily more than those that learn writing Chinese. Learning how to write Chinese is difficult. It "only" takes like 6.5 years of schooling, and when students go abroad for college, they quickly forget how to write many characters. In fact,

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-21 Thread Stephan Stiller
In India you could have telegrams containing such sentences delivered in any of the major Indian regional languages. This was a good idea in the days of the low-bandwidth telegraph And it was a domain-restricted application. Stephan

SignWriting (was: Encoding localizable sentences)

2013-04-21 Thread Stephan Stiller
sign-writing SignWriting is also difficult to write. naturelly evolved I will be very curious to see the result after a bit of evolution (I hope there will be some), with a system that can actually be written easily by hand (or at least input quickly with the right input method) and that i

  1   2   3   >