Re: Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread Ken Whistler via Unicode
On 2/21/2020 7:53 AM, Costello, Roger L. via Unicode wrote: Text files may indeed contain binary (i.e., bytes that are not interpretable as characters). Namely, text files may contain newlines, tabs, and some other invisible things. Question: "characters" are defined as only the visible thi

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Ken Whistler via Unicode
Well, no, in this case "strange" means strange, as Ken Lunde notes. I'm just pointing to his list, because it pulls together quite a few Han characters that *also* have dubious cases for encoding. Or you could turn the argument around, I suppose, and note that just because the hieroglyph for "

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Ken Whistler via Unicode
You want "dubious"?! You should see the hundreds of strange characters already encoded in the CJK *Unified* Ideographs blocks, as recently documented in great detail by Ken Lunde: https://www.unicode.org/L2/L2020/20059-unihan-kstrange-update.pdf Compared to many of those, a hieroglyph of a m

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Ken Whistler via Unicode
Richard, What it comes down to is avoidance of conundrums involving canonical reordering for normalization. The effect of variation selectors is defined in terms of an immediate adjacency. If you allowed variation selectors to be defined for combining marks of ccc!=0, then normalization of se

Re: Adding Experimental Control Characters for Tai Tham

2020-01-29 Thread Ken Whistler via Unicode
Richard, Given that those particular two variation selectors have already given very specific semantics for emoji sequences, and would now be expected to occur *only* in emoji sequences: https://www.unicode.org/reports/tr51/#def_text_presentation_selector usurping them to do something unrela

Re: Not accepted by UTC but in ISO ballot?

2019-12-27 Thread Ken Whistler via Unicode
Shriramana, That category is used to track character(s) in process that may have been approved by WG2 but are not yet in ballot, or are in contention, and may have just been dropped from ballot, but which still have sufficient visibility to be tracked. The process is a bit rough around the e

Re: Not accepted by UTC but in ISO ballot?

2019-12-26 Thread Ken Whistler via Unicode
Shriramana, On 12/20/2019 6:29 PM, Shriramana Sharma via Unicode wrote: I was looking at the pipeline for something else, and for the first time I see a character category: “not accepted by the UTC but in ISO ballot” and two characters in it. Those two characters changed status as of December 4,

Re: HEAVY EQUALS SIGN

2019-12-20 Thread Ken Whistler via Unicode
On 12/20/2019 7:17 AM, wjgo_10...@btinternet.com via Unicode wrote: It is indeed interesting that the Notice of Non-Approval itself uses italics for emphasis in two places. That text, at the present time, cannot be expressed in Unicode plain text with the emphasis that the Notice of Non-Appro

Re: New Public Review on QID emoji

2019-10-30 Thread Ken Whistler via Unicode
On 10/30/2019 10:41 AM, wjgo_10...@btinternet.com via Unicode wrote: At present I have a question to which I cannot find the answer. Is the QID emoji format, if approved by the Unicode Technical Committee going to be sent to the ISO/IEC 10646 committee for consideration by that committee?

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-12 Thread Ken Whistler via Unicode
On 10/12/2019 3:15 AM, Fred Brennan via Unicode wrote: There seems to be no conscionable reason for such a long delay after the approval. If that's just how things are done, fine, I certainly can't change the whole system. But imagine if you had to wait two years to even have a chance of using

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Ken Whistler via Unicode
Sorry about the typo there. I meant "the published Version 13.0 next March" --Ken On 10/11/2019 10:17 AM, Ken Whistler wrote: then eventually in the published Version 13.0 next month:

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Ken Whistler via Unicode
Short answer is no. The characters in the pipeline section labeled "Characters Accepted for Version 13.0" are what will be in the beta review for 13.0 (look for that sometime next month), and then eventually in the published Version 13.0 next month: https://www.unicode.org/alloc/Pipeline.htm

Re: On the lack of a SQUARE TB glyph

2019-09-27 Thread Ken Whistler via Unicode
Fred, 2 hours and 33 minutes from now (today). But you don't need to try to synch a proposal like this to a particular script ad hoc meeting. That group meets roughly once a month, and any new proposal coming in right now wouldn't be on the Unicode 13.0 train, even if the UTC immediately agre

Re: On the lack of a SQUARE TB glyph

2019-09-26 Thread Ken Whistler via Unicode
On 9/26/2019 4:21 AM, Fred Brennan via Unicode wrote: There is a clear demand for a SQUARE TB. In the font SMotoya Sinkai W55 W3, which is ©2008 株式会社 モトヤ, the glyph is unencoded and accessed via the Discretionary Ligatures (`dlig`) OpenType feature. It has name `T_B.dlig`. Aye, there's the ru

Re: PUA (BMP) planned characters HTML tables

2019-08-14 Thread Ken Whistler via Unicode
On 8/14/2019 4:32 PM, James Kass via Unicode wrote: If a character gets deprecated, can its decomposition type be changed from canonical to compatibility? Simple answer: No. --Ken

Re: New website

2019-07-22 Thread Ken Whistler via Unicode
Your helpful suggestions will be passed along to the people working on the new site. In the meantime, please note that the link to the "Unicode Technical Site" has been added to the left column of quick links in the page bottom banner, so it is easily available now from any page on the new sit

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Ken Whistler via Unicode
See the entry for "Magar Akkha" on: http://linguistics.berkeley.edu/sei/scripts-not-encoded.html Anshuman Pandey did preliminary research on this in 2011. http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf It would be premature to assign an ISO 15924 script code, pending the research to de

Access to the Unicode technical site (was: Re: Unicode's got a new logo?)

2019-07-18 Thread Ken Whistler via Unicode
On 7/18/2019 11:50 AM, Steffen Nurpmeso via Unicode wrote: I also decided to enter /L2 directly from now on. For folks wishing to access the UTC document register, Unicode Consortium standards, and so forth, all of those links will be permanently stable. They are not impacted by the rollout

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-18 Thread Ken Whistler via Unicode
On 7/17/2019 4:54 PM, Philippe Verdy via Unicode wrote: then the Unicode version (age) used for Hieroglyphs should also be assigned to Hieratic. It is already. In fact the ligatures system for the "cursive" Egyptian Hieratic is so complex (and may also have its own variants showing its progr

Re: Unicode "no-op" Character?

2019-07-03 Thread Ken Whistler via Unicode
On 7/3/2019 10:47 AM, Sławomir Osipiuk via Unicode wrote: Is my idea impossible, useless, or contradictory? Not at all. What you are proposing is in the realm of higher-level protocols. You could develop such a protocol, and then write processes that honored it, or try to convince others to

Re: acute-macron hybrid?

2019-04-30 Thread Ken Whistler via Unicode
On 4/30/2019 12:45 AM, Julian Bradfield via Unicode wrote: What is its appropriate Unicode representation? A macron. --Ken

Re: Variation Sequences (and L2-11/059)

2019-03-13 Thread Ken Whistler via Unicode
On 3/13/2019 2:42 AM, Janusz S. Bień via Unicode wrote: Hi! On Mon, Jul 16 2018 at 7:07 +02, Janusz S. Bień via Unicode wrote: FAQ (http://unicode.org/faq/vs.html) states: For historic scripts, the variation sequence provides a useful tool, because it can show mistaken or nonce gl

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Ken Whistler via Unicode
Egmont, On 2/9/2019 11:48 AM, Egmont Koblinger via Unicode wrote: Are there any (non-CJK) scripts for which crossword puzzles don't exist? There are crossword puzzles for Hindi (in the Devanagari script). Just do an image search for "Hindi crossword puzzle". But the conventions for these br

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Ken Whistler via Unicode
Richard, On 2/1/2019 1:30 PM, Richard Wordingham via Unicode wrote: Language tagging is already available in Unicode, via the tag characters in the deprecated plane. Recte: 1. Plane 14 is not a "deprecated plane". 2. The tag characters in Tag Character block (U+E..U+E007F) are not depr

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Ken Whistler via Unicode
On 1/31/2019 1:41 AM, Egmont Koblinger via Unicode wrote: I mean, for example we can introduce control characters that specify the language. That is a complete non-starter for the Unicode Standard. And if the terminal implementation introduces such as one-off hacks, they will fail completel

Re: A last missing link for interoperable representation

2019-01-08 Thread Ken Whistler via Unicode
James, On 1/8/2019 1:11 PM, James Kass via Unicode wrote: But we're still using typewriter kludges to represent stress in Latin script because there is no Unicode plain text solution. O.k., that one needs a response. We are still using kludges to represent stress in the Latin script because

Re: The encoding of the Welsh flag

2018-11-21 Thread Ken Whistler via Unicode
Michael, On 11/21/2018 9:38 AM, Michael Everson via Unicode wrote: What really annoys me about this is that there is no flag for Northern Ireland. The folks at CLDR did not think to ask either the UK or the Irish representatives to SC2 about this. Neither CLDR-TC nor SC2 has any jurisdiction

Re: The encoding of the Welsh flag

2018-11-21 Thread Ken Whistler via Unicode
On 11/21/2018 8:00 AM, William_J_G Overington via Unicode wrote: Yet the interoperability does not derive from an International Standard. The interoperability that enabled your mail to be delivered to me derives in part from the MIME standard (RFC 2045 et seq.) which is not an International

Re: The encoding of the Welsh flag

2018-11-20 Thread Ken Whistler via Unicode
On 11/20/2018 12:57 PM, William_J_G Overington via Unicode wrote: quote A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS. end quote My questions are as follows please. Is that encoding for the Wels

Re: UCA unnecessary collation weight 0000

2018-11-02 Thread Ken Whistler via Unicode
On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: I was replying not about the notational repreentation of the DUCET data table (using [....] unnecessarily) but about the text of UTR#10 itself. Which remains highly confusive, and contains completely unnecesary steps, and just compli

Re: second attempt

2018-10-31 Thread Ken Whistler via Unicode
On 10/31/2018 11:27 AM, Asmus Freytag via Unicode wrote: but we don't have an agreement that reproducing all variations in manuscripts is in scope. In fact, I would say that in the UTC, at least, we have an agreement that that clearly is out of scope! Trying to represent all aspects of text

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Ken Whistler via Unicode
On 10/30/2018 2:32 PM, James Kass via Unicode wrote: but we can't seem to agree on how to encode its abbreviation. For what it's worth, "mgr" seems to be the usual abbreviation in Polish for it. --Ken

Re: A sign/abbreviation for "magister"

2018-10-29 Thread Ken Whistler via Unicode
On 10/29/2018 8:06 PM, James Kass via Unicode wrote: could be typed on old-style mechanical typewriters.  Quintessential plain-text, that. Nope. Typewriters were regularly used for underscoring and for strikethrough, both of which are *styling* of text, and not plain text. The mere fact tha

Re: Dealing with Georgian capitalization in programming languages

2018-10-09 Thread Ken Whistler via Unicode
Martin, On 10/9/2018 12:47 AM, Martin J. Dürst via Unicode wrote: - Using the 'capitalize' method to (try to) get the titlecase   property of a MTAVRULI character. (There's no other way   currently in Ruby to get the titlecase property.) There may be others. If you have some ideas, I'd apprecia

Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Ken Whistler via Unicode
On 10/2/2018 12:45 AM, Martin J. Dürst via Unicode wrote: capitalize: uppercase (or title-case) the first character of the string, lowercase the rest When I say "cause problems", I mean producing mixed-case output. I originally thought that 'capitalize' would be fine. It is fine for lowerc

Re: UCD in XML or in CSV?

2018-08-31 Thread Ken Whistler via Unicode
On 8/31/2018 1:36 AM, Manuel Strehl via Unicode wrote: For codepoints.net I use that data to stuff everything in a MySQL database. Well, for some sense of "everything", anyway. ;-) People having this discussion should keep in mind a few significant points. First, the UCD proper isn't "ever

Re: Private Use areas

2018-08-21 Thread Ken Whistler via Unicode
On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote: On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote: On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: Is there a block of RTL PUA also? No. Perhaps there should be? This is a periodic suggestion that

Re: Private Use areas

2018-08-20 Thread Ken Whistler via Unicode
On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: Is there a block of RTL PUA also? No. --Ken

Re: Tales from the Archives

2018-08-20 Thread Ken Whistler via Unicode
Steffen noted: On 8/20/2018 3:22 PM, Steffen Nurpmeso via Unicode wrote: It was just that i have read on one of the mailing-lists i am subscribed to a cite of a Unicode statement that i have never read of anything on the Unicode mailing-list. It is very awkward, but i_again_ cannot find what

Re: Tales from the Archives

2018-08-20 Thread Ken Whistler via Unicode
Steffen, Are you looking for the Unicode list email archives? https://www.unicode.org/mail-arch/ Those contain list content going back all the way to 1994. --Ken On 8/20/2018 6:08 AM, Steffen Nurpmeso via Unicode wrote: I have the impression that many things which have been posted here some

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-19 Thread Ken Whistler via Unicode
On 7/19/2018 12:38 AM, Shai Berger via Unicode wrote: If I cannot trust that people I communicate with make the same choices I make, plain text cannot be used. Here is a counterexample. The following is a chunk of plain text output from the bidi reference implementation: Trace: Entering br

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Ken Whistler via Unicode
On 7/18/2018 6:43 AM, philip chastney via Unicode wrote: there are also contexts where "Hello World!" can be read as the function "Hello", applied to the factorial value of "World" even though such a move wouldn't necessarily remove all ambiguity, the easiest solution is to declare that formal

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-16 Thread Ken Whistler via Unicode
On 7/16/2018 3:51 PM, Shai Berger via Unicode wrote: And I should add, in response to the other points raised in this thread, from the same page in the core standard: "If the same plain text sequence is given to disparate rendering processes, there is no expectation that rendered text in each i

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-29 Thread Ken Whistler via Unicode
On 5/29/2018 12:49 AM, Richard Wordingham via Unicode wrote: How would one know that they are misapplied? And what if the author of the text has broken your rules? Are such texts never to be transcribed to pukka Unicode? Applying Tamil -ii (0BC0, Script=Tamil) to the Latin letter a (0061,

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Ken Whistler via Unicode
On 5/28/2018 9:44 PM, Asmus Freytag via Unicode wrote: One of the general principles is that combining marks inherit the property of their base character. Normally, "inherited" should be the only property value for combining marks. There have been some deviations from this over the years,

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Ken Whistler via Unicode
On 5/28/2018 9:23 PM, Martin J. Dürst via Unicode wrote: Hello Sundar, On 2018/05/28 04:27, SundaraRaman R via Unicode wrote: Hi, In languages like Ruby or Java (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)), functions to check if a character is alp

Re: Major vendors changing U+1F52B PISTOL 🔫 depiction from firearm to squirt gun

2018-05-23 Thread Ken Whistler via Unicode
On 5/23/2018 8:53 AM, Abe Voelker via Unicode wrote: As a user I find it troublesome because previous messages I've sent using this character on these platforms may now be interpreted differently due to the changed representation. That aspect has me wondering if this change is in line with Uni

Re: preliminary proposal: New Unicode characters for Arabic music half-flat and half-sharp symbols

2018-05-15 Thread Ken Whistler via Unicode
On 5/15/2018 2:46 PM, Markus Scherer via Unicode wrote: I am proposing the addition of 2 new characters to the Musical Symbols table: - the half-flat sign (lowers a note by a quarter tone) - the half-sharp sign (raises a note by a quarter tone) In an actual proposal, I would

Re: Is the Editor's Draft public?

2018-04-20 Thread Ken Whistler via Unicode
Henri, There is no formal concept of a public "Editor's Draft" for the Unicode core specification. This is mostly the result of the tools used for editing the core specification, which is still structured more like a book than the usual online internet specification. Currently the Unicode ed

Re: Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-02 Thread Ken Whistler via Unicode
On 4/2/2018 7:02 PM, Philippe Verdy via Unicode wrote: We're missing the definition of "ymojis", a safer alternatives of "umojis" (unknown), but that "you" can create yourself for use by yourself Not to mention "əmojis", as in "Uh, Moe! Jeez, why are we still talking about this?!" --Ken

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Ken Whistler via Unicode
On 3/9/2018 9:29 AM, via Unicode wrote: Documented increase such as scientific terms for new elements, flora and fauna, would seem to be not more one or two dozen a year. Indeed. Of the "urgently needed characters" added to the unified CJK ideographs for Unicode 11.0, two were obscure place

Re: Translating the standard

2018-03-09 Thread Ken Whistler via Unicode
On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote: As of translating the Core spec as a whole, why did two recent attempts crash even before the maintenance stage, while the 3.1 project succeeded? Essentially because both the Japanese and the Chinese attempts were conceived of as comm

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Ken Whistler via Unicode
On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote: Shouldn't we create a variant of IDS, using combining joiners between Han base glyphs (then possibly augmented by variant selectors if there are significant differences on the simplification of rendered strokes for each component) ? What

Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-05 Thread Ken Whistler via Unicode
On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote: I have a question; if some people try to make a translated version of Unicode And to add to Asmus' response, folks on the list should understand that even with the best of effort, the concept of a "translated version of Unicode" is a nea

CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0 characters now ready for adoption!)

2018-03-05 Thread Ken Whistler via Unicode
John, I think this may be giving the list a somewhat misleading picture of the actual statistics for encoding of CJK unified ideographs. The "500 characters a year" or "1000 characters a year" limits are administrative limits set by the IRG for national bodies (and others) submitting repertoi

Re: Bidi edge cases in Hangul and Indic

2018-02-22 Thread Ken Whistler via Unicode
David, On 2/22/2018 7:21 PM, David Corbett via Unicode wrote: My confusion stems from Unicode’s online bidi utility. That bidi utility has known defects in it. It is not yet conformant with changes to UBA 6.3, let alone later changes to UBA. And the mapping of memory position to display pos

Re: Bidi edge cases in Hangul and Indic

2018-02-22 Thread Ken Whistler via Unicode
On 2/22/2018 11:39 AM, David Corbett via Unicode wrote: For example, after a right-to-left override, the Hangul string 보기 (“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form is reordered by jamo instead of by syllable; that is, it looks like “igob”. Nope. *tilt* The UBA reor

Re: IDC's versus Egyptian format controls

2018-02-16 Thread Ken Whistler via Unicode
On 2/16/2018 11:00 AM, Asmus Freytag via Unicode wrote: On 2/16/2018 8:00 AM, Richard Wordingham via Unicode wrote: That doesn't square well with, "An implementation *may* render a valid Ideographic Description Sequence either by rendering the individual characters separately or by parsing the

Re: IDC's versus Egyptian format controls

2018-02-16 Thread Ken Whistler via Unicode
On 2/16/2018 8:22 AM, Ken Whistler wrote: The Egyptian quadrat controls, on the other hand, are full-fledged Unicode format controls. One more point of distinction: The (gc=So) IDC's follow a syntax that uses Polish notation order for the descriptive operators (inherited from the int

IDC's versus Egyptian format controls (was: Re: Why so much emoji nonsense?)

2018-02-16 Thread Ken Whistler via Unicode
On 2/16/2018 8:00 AM, Richard Wordingham via Unicode wrote: A more portable solution for ideographs is to render an Ideographic Description Sequences (IDS) as approximations to the characters they describe. The Unicode Standard carefully does not prohibit so doing, and a similar scheme is being

Re: Why so much emoji nonsense?

2018-02-15 Thread Ken Whistler via Unicode
On 2/15/2018 2:24 PM, Philippe Verdy via Unicode wrote: And it's in the mission of Unicode, IMHO, to promote litteracy Um, no. And not even literacy, either. ;-) https://en.wikipedia.org/wiki/Category:Organizations_promoting_literacy --Ken

Re: Why so much emoji nonsense?

2018-02-14 Thread Ken Whistler via Unicode
On 2/14/2018 12:49 PM, Philippe Verdy via Unicode wrote: RCLLTHTWHNLPHBTSWRFRSTNVNTDPPLWRTTXTLKTHS ! [ ... lots to say about the history of writing ... ] And the use (or abuse) of emojis is returning us to the prehistory when people draw animals on walls of caverns: this was a very slow

Re: Why so much emoji nonsense?

2018-02-14 Thread Ken Whistler via Unicode
On 2/14/2018 12:53 AM, Erik Pedersen via Unicode wrote: Unlike text composed of the world’s traditional alphabetic, syllabic, abugida or CJK characters, emoji convey no utilitarian and unambiguous information content. I think this represents a misunderstanding of the function of emoji in wr

Re: Word_Break for Hieroglyphs

2017-12-14 Thread Ken Whistler via Unicode
Gentlemen, On 12/14/2017 6:53 AM, Mark Davis ☕️ via Unicode wrote: Thus I would like people who are both knowledgeable about hieroglyphs /and/ Unicode properties to weigh in. I know that people like Andrew Glass are on this list, who satisfy both criteria. ​ And what constitutes a cluster?

Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Ken Whistler via Unicode
Asmus, On 12/5/2017 12:35 PM, Asmus Freytag via Unicode wrote: I don't know the history of this particular "unification" Here are some clues to guide further research on the history. The annotation in question was added to a draft of the NamesList.txt file for Unicode 4.1 on October 7, 2003

Re: implicit weight base for U+2CEA2

2017-09-27 Thread Ken Whistler via Unicode
On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote: On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode mailto:unicode@unicode.org>> wrote: I recently updated pyuca[1], my pure Python implementation of the Unicode Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0

Re: IBM 1620 invalid character symbol

2017-09-27 Thread Ken Whistler via Unicode
Ken, On 9/27/2017 11:10 AM, Ken Shirriff via Unicode wrote: The IBM type catalog might be of interest. It describes in great detail the character sets of the IBM typewriters and line printers and the custom characters that can be ordered for printer chains and Selectric type balls. Link: htt

Re: IBM 1620 invalid character symbol

2017-09-27 Thread Ken Whistler via Unicode
Asmus, On 9/27/2017 10:02 AM, Asmus Freytag via Unicode wrote: In that context it's worth remembering that there while you could say for most typewriters that "the typewriter is the font", there were noted exceptions. The IBM Selectric, for example, had exchangeable type balls which allowed

Re: IBM 1620 invalid character symbol

2017-09-27 Thread Ken Whistler via Unicode
Leo, On 9/26/2017 9:00 PM, Leo Broukhis via Unicode wrote: The next time I'm at the Mountain View CHM, I'll try to ask. However, assuming it was an overstrike of an X and an I, then where does the "Eris"-like glyph come from? Was there ever an IBM font with a double-semicircular X like )( ?

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Ken Whistler via Unicode
Philippe, Those aren't negative digits, per se. The usage in the manual is with an overline (or macron) to indicate the flag bit. It does occur over a zero, and in explanation in the text of floating point operations, it is also shown over letters (X, M, E) representing digits of the exponent

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Ken Whistler via Unicode
Leo, Yeah, I know. My point was that by examining the physical typewriter keys (the striking head on the typebar, not the images on the keypads), one could see what could be generated *by* overstriking. I think Philippe's suggestion that it was simply an overstrike of "X" with an "I" is proba

Re: IBM 1620 invalid character symbol

2017-09-25 Thread Ken Whistler via Unicode
The 1620 manual accessed from the Wiki page shows the same information but with a different glyph (which looks more like the capital zhe, and is presumably the source of the glyph cited in the Wiki page itself). See: http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pd

Re: Rendering variants of U+3127 Bopomofo Letter I

2017-08-24 Thread Ken Whistler via Unicode
Albrecht, See TUS, Section 18.3, Bopomofo, p. 707: http://www.unicode.org/versions/Unicode10.0.0/ch18.pdf#G22553 --Ken On 8/24/2017 12:19 AM, Dreiheller, Albrecht via Unicode wrote: Hello Chinese experts, The Letter I in the Bopomofo alphabet (U+3127)has a two rendering variants, a vertic

Re: emoji props in the ucdxml ?

2017-07-05 Thread Ken Whistler via Unicode
Manuel, I suspect that such a link may already be in the works for the /Public/emoji/ data directory. But if you want to make sure your suggestion is reviewed by the UTC, you should submit it via the contact form: http://www.unicode.org/reporting.html --Ken On 7/5/2017 12:37 PM, Manuel Str

Re: emoji props in the ucdxml ?

2017-07-05 Thread Ken Whistler via Unicode
On 7/5/2017 10:01 AM, Daniel Bünzli via Unicode wrote: I know the emoji properties [1] are no formally part of the UCD (not sure exactly why though), Because they are maintained as part of an independent standard now (UTS #51), which is still on track to have a faster turnaround -- and hence

Re: Announcing The Unicode® Standard, Version 10.0

2017-06-21 Thread Ken Whistler via Unicode
I wonder IF 9 times suffice, But IF more are required, I'll tweet ILY, tweet it twice -- Since spelling's been retired. On 6/21/2017 8:37 AM, William_J_G Overington via Unicode wrote: Here is a mnemonic poem, that I wrote on Monday 20 February 2017, now published as U+1F91F is now officially i

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 8:32 PM, Richard Wordingham via Unicode wrote: TUS Section 3 is like the Augean Stables. It is a complete mess as a standards document, That is a matter of editorial taste, I suppose. imputing mental states to computing processes. That, however, is false. The rhetorical turn i

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 6:21 PM, Richard Wordingham via Unicode wrote: By definition D39b, either sequence of bytes, if encountered by an conformant UTF-8 conversion process, would be interpreted as a sequence of 6 maximal subparts of an ill-formed subsequence. ("D39b" is a typo for "D93b".) Sorry about

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote: You were implicitly invited to argue that there was no need to handle 5 and 6 byte invalid sequences. Well, working from the *current* specification: FC 80 80 80 80 80 and FF FF FF FF FF FF are equal trash, uninterpretable as *anyth

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-26 Thread Ken Whistler via Unicode
On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote: The link provided about the PRI doesn't lead to the comments. PRI #121 (August, 2008) pre-dated the practice of keeping all the feedback comments together with the PRI itself in a numbered directory with the name "feedback.html". But

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Ken Whistler via Unicode
Richard On 5/23/2017 1:48 PM, Richard Wordingham via Unicode wrote: The object is to generate code*now* that, up to say Unicode Version 23.0, can work out, from the UCD files DerivedAge.txt and PropertyValueAliases.txt, whether an arbitrary code point was included by some Unicode version ident

Re: English flag (from Re: How to Add Beams to Notes)

2017-05-03 Thread Ken Whistler via Unicode
On 5/3/2017 3:20 AM, William_J_G Overington via Unicode wrote: Surely a single code point could be found. Single code points are being found for various emoji items on a continuing basis. Why pull up the ladder on encoding some flags each with a single code point? Yes, a single code point for

Traction and Deprecation (was: Re: Unicode Emoji 5.0 characters now final)

2017-03-29 Thread Ken Whistler
On 3/29/2017 1:12 PM, Doug Ewell wrote: Is that common practice in Unicode, that if something doesn't gain significant traction in the comparatively short term, it becomes a candidate for deprecation? If a mechanism was dodgy in the first place and was dubious as a part of plain text, then ye

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Ken Whistler
On 3/29/2017 1:12 PM, Doug Ewell wrote: I would think vendors could make their own business decisions about what flags to support. "Hmm, yeah, definitely Texas, maybe Lombardy, not so sure about Colorado, probably not Guna Yala." I don't see why they had to be essentially told what to support an

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Ken Whistler
On 3/27/2017 1:39 PM, Philippe Verdy wrote: Note also that ISO3166-2 is far from being stable, and this could contradict Unicode encoding stability: it would then be required to ensure this stability by only allowing sequences that are effectively registered in http://www.unicode.org/Public/

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Ken Whistler
On 3/27/2017 12:17 PM, Doug Ewell wrote: announcements at Unicode dot org wrote: — and new regional flags for England, Scotland, and Wales. It's not clear from this text, nor from the table in Section C.1.1 of the draft, what the status is of flag emoji tag sequences other than the three abov

Re: Encoding of old compatibility characters

2017-03-27 Thread Ken Whistler
On 3/27/2017 7:44 AM, Charlotte Buff wrote: Now, one of Unicode’s declared goals is to enable round-trip compatibility with legacy encodings. We’ve accumulated a lot of weird stuff over the years in the pursuit of this goal. So it would be natural to assume that the unencoded characters from t

Re: Combining solidus above for transcription of poetic meter

2017-03-17 Thread Ken Whistler
On 3/17/2017 10:27 AM, Julian Bradfield wrote: If you're working in a situation where you don't have either markup control or the facility to use plain monospaced text, then just use normal breves and acutes. It's not clear to me that laying out aligned text (for which there are many other appli

Re: Stokoe Notation (sign language)

2017-03-07 Thread Ken Whistler
On 3/6/2017 2:48 PM, Simon Cozens wrote: A few years back, there was a set of questions to the UTC (L2/12-133) asking for direction on encoding Stokoe notation. Did these ever get an answer, and is there anything currently happening with Stokoe encoding? The short answer is no. Stokoe notati

Re: Translations of city names

2017-03-02 Thread Ken Whistler
The UN Group of Experts on Geographical Names (UNGEGN) is also relevant: https://unstats.un.org/unsd/geoinfo/ungegn/default.html They keep up a list of searchable geographical names databases in a wide variety of languages: https://unstats.un.org/unsd/geoinfo/ungegn/geonames.html --Ken On

Re: WAP Pictogram Specification as Emoji Source

2017-02-13 Thread Ken Whistler
On 2/13/2017 1:26 PM, Christoph Päper wrote: Ken Whistler : On 2/13/2017 1:39 AM, Christoph Päper wrote: - music/rest – is that what 〽️ or 〰️ means? The first of those is presumably U+303D PART ALTERNATION MARK, and the second is probably the notorious U+3030 WAVY DASH. So not emoji at all

Re: WAP Pictogram Specification as Emoji Source

2017-02-13 Thread Ken Whistler
I can't speak to the missing emoji mappings, but... On 2/13/2017 1:39 AM, Christoph Päper wrote: - music/rest – is that what 〽️ or 〰️ means? The first of those is presumably U+303D PART ALTERNATION MARK, and the second is probably the notorious U+3030 WAVY DASH. So not emoji at all. --Ken

Re: Indic Syllabic Category of U+11134 CHAKMA MAAYYAA

2017-02-03 Thread Ken Whistler
Richard, On 2/3/2017 2:35 PM, Richard Wordingham wrote: Except that the added annotation "also used distinctly as a gemination mark which can occur with vowels" also applies to U+103A MYANMAR SIGN ASAT. TUS 9.0 Section 16.3 Myanmar calls the base 'double-acting' rather than 'geminate', but it'

Re: On the upcoming LATIN LETTER SMALL CAPITAL Q

2017-01-11 Thread Ken Whistler
This is a character under ballot for Amendment 1 to the 5th edition. It isn't part of the repertoire planned for publication as part of Unicode 10.0 in June. So if you want to have any impact on the subhead used in the charts for A7AF, the correct mechanism now is to get a national body commen

Re: Another UAX #29 bug: property tables need updating

2016-12-22 Thread Ken Whistler
Manish, On 12/22/2016 10:35 AM, Manish Goregaokar wrote: The property table should include all role and gender modifiers as GAZ. Could this be updated? Property values cannot be updated for *published* versions of the standard. What you should do is submit your feedback as part of the pub

Re: Best practices for replacing UTF-8 overlongs

2016-12-20 Thread Ken Whistler
On 12/20/2016 10:33 AM, Markus Scherer wrote: Yes. However, some of the discussion in this thread is due to details that were not spelled out in the PRI. There is basically a 2a and a 2b, while the examples in PRI #121 work the same in both variants. I wasn't intending to argue the case one

Re: Best practices for replacing UTF-8 overlongs

2016-12-20 Thread Ken Whistler
Doug, On 12/19/2016 6:08 PM, Doug Ewell wrote: I thought there was a corrigendum or other, comparatively recent addition to the Standard that spelled out how replacement characters are supposed to be substituted for invalid code unit sequences -- something about detecting maximally long seque

Fwd: Re: Should unassigned code points in blocks reserved for combining marks, etc be GCB extended?

2016-12-12 Thread Ken Whistler
Forwarded Message Subject: Re: Should unassigned code points in blocks reserved for combining marks, etc be GCB extended? Date: Mon, 12 Dec 2016 08:26:45 -0800 From: Ken Whistler To: Karl Williamson On 12/12/2016 6:59 AM, Karl Williamson wrote: These are

Re: UAX #9 (Bidirectional algorithm) reference implementations

2016-12-09 Thread Ken Whistler
On 12/8/2016 6:41 PM, Fabian Giesen wrote: 1. BidiReferenceJava supports Unicode 6.3.0, but has not been updated for later versions. We have an updated version of BidiReferenceJava about ready to deploy into the PROGRAMS directory. About the bug you note in BidiReferenceC, I'll investigate

Re: The usage of Z WITH STROKE

2016-11-28 Thread Ken Whistler
On 11/25/2016 10:20 PM, Janusz S. Bień wrote: Now there is a follow-up question: why the character was included in Unicode 1.1.0? Well, it was included in Unicode 1.1 because it was published in Unicode 1.0 already. So that is the proximate reason. That inevitably will raise the question, "

  1   2   3   >