Re: Is the binaryness/textness of a data format a property?

2020-03-22 Thread Martin J . Dürst via Unicode
On 23/03/2020 03:56, Markus Scherer via Unicode wrote: > On Sat, Mar 21, 2020 at 12:35 PM Doug Ewell via Unicode > wrote: > >> I thought the whole premise of GB18030 was that it was Unicode mapped into >> a GB2312 framework. What characters exist in GB18030 that don't exist in >> Unicode, and hav

Re: Is the binaryness/textness of a data format a property?

2020-03-20 Thread Martin J . Dürst via Unicode
On 20/03/2020 23:41, Adam Borowski via Unicode wrote: > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF or > U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has its uses > but is not well-formed Unicode. This would definitely no longer be UTF-8! Martin.

Call for Papers: G21C Grapholinguistics in the 21st century, Paris June 2020

2020-01-06 Thread Martin J . Dürst via Unicode
Happy New Year to everybody on this list! Except for the Internationalization and Unicode Conference (see https://www.unicodeconference.org/; submission deadline March 6, 2020), this list very rarely sees calls for papers, but this one should definitely be of interest at least to a subset of pe

Re: Grapheme clusters and backspace (was Re: Coding for Emoji: how to modify programs to work with emoji)

2019-10-22 Thread Martin J . Dürst via Unicode
Hello Richard, others, On 2019/10/23 07:32, Richard Wordingham via Unicode wrote: > On Tue, 22 Oct 2019 23:27:27 +0200 > Daniel Bünzli via Unicode wrote: >> Just to make things clear. When you say character in your message, >> you consistently mean scalar value right ? > > Yes. > > I find it h

Re: Unicode website glitches. (was The Most Frequent Emoji)

2019-10-11 Thread Martin J . Dürst via Unicode
. > Mark > > > On Thu, Oct 10, 2019 at 11:50 PM Martin J. Dürst via Unicode < > unicode@unicode.org> wrote: > >> I had a look at the page with the frequencies. Many emoji didn't >> display, but that's my browser's problem. What was worse was t

Fwd: The Most Frequent Emoji

2019-10-10 Thread Martin J . Dürst via Unicode
I had a look at the page with the frequencies. Many emoji didn't display, but that's my browser's problem. What was worse was that the sidebar and the stuff at the bottom was all looking weird. I hope this can be fixed. Regards, Martin. Forwarded Message Subject: The Most F

Re: Manipuri/Meitei customary writing system

2019-10-04 Thread Martin J . Dürst via Unicode
On 2019/10/04 15:35, Martin J. Dürst via Unicode wrote: > Hello Markus, > > On 2019/10/04 01:53, Markus Scherer via Unicode wrote: >> Dear Unicoders, >> >> Is Manipuri/Meitei customarily written in Bangla/Bengali script or >> in Meitei script? >> >>

Re: Manipuri/Meitei customary writing system

2019-10-03 Thread Martin J . Dürst via Unicode
Hello Markus, On 2019/10/04 01:53, Markus Scherer via Unicode wrote: > Dear Unicoders, > > Is Manipuri/Meitei customarily written in Bangla/Bengali script or > in Meitei script? > > I am looking at > https://en.wikipedia.org/wiki/Meitei_language#Writing_systems which seems > to describe writing

Re: Emoji Haggadah

2019-04-16 Thread Martin J . Dürst via Unicode
Hello Mark, others, On 2019/04/16 12:18, Mark E. Shoulson via Unicode wrote: > Yes.  But the sentences aren't just symbolic representations of the > concepts or something.  They are frequently direct > transcriptions—usually by puns—for *English* sentences, so left-to-right > makes sense.  So f

Re: Encoding italic

2019-02-09 Thread Martin J . Dürst via Unicode
On 2019/02/09 19:58, Richard Wordingham via Unicode wrote: > On Fri, 8 Feb 2019 18:08:34 -0800 > Asmus Freytag via Unicode wrote: >> Under the implicit assumptions bandied about here, the VS approach >> thus reveals itself as a true rich-text solution (font switching) >> albeit realized with pseu

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Martin J . Dürst via Unicode
On 2019/01/31 07:02, Richard Wordingham via Unicode wrote: > On Wed, 30 Jan 2019 15:33:38 +0100 > Frédéric Grosshans via Unicode wrote: > >> Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit : >>> - It doesn't do Arabic shaping. In my recommendation I'm arguing >>> that in this mode, wh

Re: Encoding italic

2019-01-29 Thread Martin J . Dürst via Unicode
On 2019/01/28 05:03, James Kass via Unicode wrote: > > A new beta of BabelPad has been released which enables input, storing, > and display of italics, bold, strikethrough, and underline in plain-text > using the tag characters method described earlier in this thread.  This > enhancement is des

Re: Encoding italic

2019-01-29 Thread Martin J . Dürst via Unicode
On 2019/01/24 23:49, Andrew West via Unicode wrote: > On Thu, 24 Jan 2019 at 13:59, James Kass via Unicode > wrote: > We were told time and time again when emoji were first proposed that > they were required for encoding for interoperability with Japanese > telecoms whose usage had spilled over t

Re: Encoding italic

2019-01-17 Thread Martin J . Dürst via Unicode
On 2019/01/17 17:51, James Kass via Unicode wrote: > > On 2019-01-17 6:27 AM, Martin J. Dürst replied: > > ... > > Based by these data points, and knowing many of the people involved, my > > description would be that decisions about what to encode as characters > > (plain text) and what to de

Re: Encoding italic

2019-01-16 Thread Martin J . Dürst via Unicode
On 2019/01/17 12:38, James Kass via Unicode wrote: > ( http://www.unicode.org/versions/Unicode11.0.0/ch02.pdf ) > > "Plain text must contain enough information to permit the text to be > rendered legibly, and nothing more." > > The argument is that italic information can be stripped yet stil

Re: A last missing link for interoperable representation

2019-01-14 Thread Martin J . Dürst via Unicode
On 2019/01/15 07:58, David Starner via Unicode wrote: > On Mon, Jan 14, 2019 at 2:09 AM Tex via Unicode wrote: >> ·Plain text still has tremendous utility and rich text is not always >> an option. > > Where? Twitter has the option of doing rich text, as does any closed > system. In fact

Re: A last missing link for interoperable representation

2019-01-14 Thread Martin J . Dürst via Unicode
On 2019/01/15 10:48, Mark E. Shoulson via Unicode wrote: > On 1/14/19 4:21 PM, Asmus Freytag via Unicode wrote: >> Short of that, I'm extremely leery of "leading" standardization; that >> is, encoding things that "might" be used. >> > It is certainly true that Unicode should not be (and wasn't

Re: A last missing link for interoperable representation

2019-01-14 Thread Martin J . Dürst via Unicode
Hello James, others, On 2019/01/14 15:24, James Kass via Unicode wrote: > > Martin J. Dürst wrote, > > > I'd say it should be conservative. As the meaning of that word > > (similar to others such as progressive and regressive) may be > > interpreted in various way, here's what I mean by that.

Re: A last missing link for interoperable representation

2019-01-14 Thread Martin J . Dürst via Unicode
Hello James, others, From the examples below, it looks like a feature request for Twitter (and/or Facebook). Blaming the problem on Unicode doesn't seem to be appropriate. Regards, Martin. On 2019/01/14 18:06, James Kass via Unicode wrote: > > Not a twitter user, don't know how popular the

Re: A last missing link for interoperable representation

2019-01-13 Thread Martin J . Dürst via Unicode
On 2019/01/14 01:46, Julian Bradfield via Unicode wrote: > On 2019-01-12, Richard Wordingham via Unicode wrote: >> On Sat, 12 Jan 2019 10:57:26 + (GMT) >> And what happens when you capitalise a word for emphasis or to begin a >> sentence? Is it no longer the same word? > > Indeed. As has be

Re: A last missing link for interoperable representation

2019-01-13 Thread Martin J . Dürst via Unicode
On 2019/01/13 03:50, Asmus Freytag via Unicode wrote: > To reiterate, if you effectively require a span (even if you could simulate > that > differently) you are in the realm or rich text. The one big exception to that > is > bidi, because it is utterly impossible to do bidi text without text ra

Re: A last missing link for interoperable representation

2019-01-13 Thread Martin J . Dürst via Unicode
On 2019/01/13 13:24, James Kass via Unicode wrote: > > Mark E. Shoulson wrote, > > > This discussion has been very interesting, really.  I've heard what I > > thought were very good points and relevant arguments from both/all > > sides, and I confess to not being sure which I actually prefer.

Re: A last missing link for interoperable representation

2019-01-11 Thread Martin J . Dürst via Unicode
On 2019/01/11 16:13, James Kass via Unicode wrote: > Styled Latin text is being simulated with math alphanumerics now, which > means that data is being interchanged and archived.  That's the user > demand illustrated. Almost by definition, styled text isn't plain text, even if it's simulated b

Re: A last missing link for interoperable representation

2019-01-10 Thread Martin J . Dürst via Unicode
On 2019/01/11 10:48, James Kass via Unicode wrote: > Is it true that many of the CJK variants now covered were previously > considered by the Consortium to be merely stylistic variants? What is a stylistic variant or not is quite a bit more complicated for CJK than for scripts such as Latin. In

Re: A sign/abbreviation for "magister"

2018-10-31 Thread Martin J . Dürst via Unicode
On 2018/11/01 03:10, Marcel Schneider via Unicode wrote: > On 31/10/2018 at 17:27, Julian Bradfield via Unicode wrote: >> When one does question the Académie about the fact, this is their >> reply: >> >> Le fait de placer en exposant ces mentions est de convention >> typographique ; il convient do

Re: A sign/abbreviation for "magister"

2018-10-31 Thread Martin J . Dürst via Unicode
On 2018/10/31 03:51, Marcel Schneider via Unicode wrote: > On 30/10/2018 at 18:59, Doug Ewell via Unicode wrote: >> >> Marcel Schneider wrote: >> >>> This use case is different from the use case that led to submit >>> the L2/18-206 proposal, cited by Dr Ewell on 29/10/2018 at 20:29: >> >> I guess t

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Martin J . Dürst via Unicode
On 2018/10/29 05:42, Michael Everson via Unicode wrote: > This is no different the Irish name McCoy which can be written MᶜCoy where > the raising of the c is actually just decorative, though perhaps it was once > an abbreviation for Mac. In some styles you can see a line or a dot under the > ra

Re: Fallback for Sinhala Consonant Clusters

2018-10-14 Thread Martin J. Dürst via Unicode
Hello Richard, On 2018/10/14 09:02, Richard Wordingham via Unicode wrote: Are there fallback rules for Sinhala consonant clusters? There are fallback rules for Devanagari, but I'm not sure if they read across. The problem I am seeing is that the Pali syllable 'ndhe' න්‍ධෙ Let's label this a

Re: Dealing with Georgian capitalization in programming languages

2018-10-09 Thread Martin J. Dürst via Unicode
Hello Ken, others, On 2018/10/03 06:43, Ken Whistler wrote: But it seems to me that the problem you are citing can be avoided if you simply rethink what your "capitalize" means. It really should be conceived of as first lowercasing the *entire* string, and then titlecasing the *eligible* lett

Re: Dealing with Georgian capitalization in programming languages

2018-10-04 Thread Martin J. Dürst via Unicode
Ken, Markus, Many thanks for your ideas, which I noted at https://bugs.ruby-lang.org/issues/14839. Regards, Martin. On 2018/10/03 06:43, Ken Whistler wrote: On 10/2/2018 12:45 AM, Martin J. Dürst via Unicode wrote: My questions here are: - Has this been considered when Georgian Mtavruli

Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Martin J. Dürst via Unicode
Since the last discussion on Georgian (Mtavruli) on this mailing list, I have been looking into how to implement it in the Programming language Ruby. Ruby has four case-conversion operations for its class String: upcase: convert all characters to upper case downcase: convert all characters to

Re: Shortcuts question

2018-09-16 Thread Martin J. Dürst via Unicode
On 2018/09/16 21:08, Marcel Schneider via Unicode wrote: An additional level of complexity is induced by ergonomics. so that most non-Latin layouts may wish to stick with QWERTY, and even ergonomic layouts in the footprints of August Dvorak rather than Shai Coleman are likely to offer variants

Re: UCD in XML or in CSV? (is: UCD in YAML)

2018-09-07 Thread Martin J. Dürst via Unicode
On 2018/09/08 04:47, Rebecca Bettencourt via Unicode wrote: On Fri, Sep 7, 2018 at 11:20 AM Philippe Verdy via Unicode < unicode@unicode.org> wrote: That version has been announced in the Windows 10 Hub several weeks ago. And it only took them 33 years. :) I used to joke that Notepad would

Re: Diacritic marks in parentheses

2018-07-26 Thread Martin J. Dürst via Unicode
On 2018/07/27 01:27, Markus Scherer via Unicode wrote: I would not expect for Ä+combining () above = Ä᪻ to look right except with specialized fonts. http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%84%5Cu1ABB&s=&uv=0 Even if it worked widely, I think it would be confusing. Yes, for the momen

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-04 Thread Martin J. Dürst via Unicode
Hello Rebecca, On 2018/06/05 12:43, Rebecca T via Unicode wrote: Something I’d love to see is translated keywords; shouldn’t be hard with a line in the cargo.toml for a ruidmentary lookup. Again, I’m of the opinion that an imperfect implementation is better than no attempt. I remember reading a

Re: Hyphenation Markup

2018-06-02 Thread Martin J. Dürst via Unicode
Hello Richard, On 2018/06/02 20:37, Richard Wordingham via Unicode wrote: Am 2018-06-02 um 06:44 schrieb Richard Wordingham via Unicode: In Latin text, one can indicate permissible line break opportunities between grapheme clusters by inserting U+00AD SOFT HYPHEN. What low-end schemes, if any

Re: Uppercase ß

2018-05-29 Thread Martin J. Dürst via Unicode
On 2018/05/29 17:15, Hans Åberg via Unicode wrote: On 29 May 2018, at 07:30, Asmus Freytag via Unicode wrote: An uppercase exists and it has formally been ruled as acceptable way to write this letter (mostly an issue for ALL CAPS as ß does not occur in word-initial position). A./ Duden

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Martin J. Dürst via Unicode
Hello Sundar, On 2018/05/28 04:27, SundaraRaman R via Unicode wrote: Hi, In languages like Ruby or Java (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)), functions to check if a character is alphabetic do that by looking for the 'Alphabetic' property (defi

Re: Major vendors changing U+1F52B PISTOL 🔫 depiction from firearm to squirt gun

2018-05-23 Thread Martin J. Dürst via Unicode
On 2018/05/24 03:00, Michael Everson via Unicode wrote: I consider it a significant semantic shift from the intended meaning of the character in the source Japanese character set. Yes and no. I'd consider the semantic shift from a real pistol in a Japanese message to a real pistol in a messag

Re: Is the Editor's Draft public?

2018-04-20 Thread Martin J. Dürst via Unicode
On 2018/04/20 18:12, Martin J. Dürst wrote: There was an announcement for a public review period just recently. The review period is up to the 23rd of April. I'm not sure whether the announcement is up somewhere on the Web, but I'll forward it to you directly. Sorry, found the Web address of

Re: Is the Editor's Draft public?

2018-04-20 Thread Martin J. Dürst via Unicode
Hello Henri, On 2018/04/20 17:15, Henri Sivonen via Unicode wrote: Is the Editor's Draft of the Unicode Standard visible publicly? Use case: Checking if things that I might send feedback about have already been addressed since the publication of Unicode 10.0. There was an announcement for a p

Re: Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-02 Thread Martin J. Dürst via Unicode
On 2018/04/03 10:56, Mark E. Shoulson via Unicode wrote: Whew!  Thanks for explaining the joke! Everyone here really thought they were serious.  Maybe you should write to the authors of the RFC and explain to them that their growth-function is incorrect.  I'm sure they'd be glad of the correcti

Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-01 Thread Martin J. Dürst via Unicode
Please enjoy. Sorry for being late with forwarding, at least in some parts of the world. Regards, Martin. Forwarded Message Subject: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode Date: Sun, 1 Apr 2018 08:29:00 -0700 (PDT) From: rfc-edi...@rfc-editor.org Reply-T

Re: A sketch with the best-known Swiss tongue twister

2018-03-13 Thread Martin J. Dürst via Unicode
On 2018/03/09 21:24, Mark Davis ☕️ wrote: There are definitely many dialects across Switzerland. I think that for *this* phrase it would be roughly the same for most of the population, with minor differences (eg 'het' vs 'hät'). But a native speaker like Martin would be able to say for sure. Ye

Re: A sketch with the best-known Swiss tongue twister

2018-03-13 Thread Martin J. Dürst via Unicode
On 2018/03/10 20:26, philip chastney via Unicode wrote: I would make the following observations on terminology in practice: -- the newspapers in Zurich advertised courses in "Hoch Deutsch", for those who needed to deal with foreigners This should probably be written 'the newspapers in Zuri

Re: base1024 encoding using Unicode emojis

2018-03-12 Thread Martin J. Dürst via Unicode
On 2018/03/12 02:07, Keith Turner via Unicode wrote: Yeah, it certainly results in larger utf8 strings. For example a sha256 hash is 112 bytes when encoded as Ecoji utf8. For base64, sha256 is 44 bytes. Even though its more bytes, Ecoji has less visible characters than base64 for sha256. Eco

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Martin J. Dürst via Unicode
On 2018/03/09 10:22, Philippe Verdy via Unicode wrote: As well how Chinese/Japanese post offices handle addresses written with sinograms for personal names ? Is the expanded IDS form acceptable for them, or do they require using Romanized addresses, or phonetic approximations (Bopomofo in China,

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Martin J. Dürst via Unicode
On 2018/03/09 10:17, Philippe Verdy via Unicode wrote: This still leaves the question about how to write personal names ! IDS alone cannot represent them without enabling some "reasonable" ligaturing (they don't have to match the exact strokes variants for optimal placement, or with all possible

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-04 Thread Martin J. Dürst via Unicode
Hello John, On 2018/03/01 12:31, via Unicode wrote: Pen, or brush and paper is much more flexible. With thousands of names of people and places still not encoded I am not sure if I would describe hans (simplified Chinese characters) as well supported. nor with current policy which limits Chin

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Martin J. Dürst via Unicode
On 2018/02/28 19:38, Janusz S. Bień via Unicode wrote: On Tue, Feb 27 2018 at 13:45 -0800, announceme...@unicode.org writes: The 157 new Emoji are now available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages. I'm quite curious what it the relation bet

Re: 0027, 02BC, 2019, or a new character?

2018-02-22 Thread Martin J. Dürst via Unicode
On 2018/02/21 12:15, Michael Everson via Unicode wrote: I absolutely disagree. There’s a whole lot of related languages out there, and the speakers share some things in common. Orthographic harmonization between these languages can ONLY help any speaker of one to access information in any of t

Re: IDC's versus Egyptian format controls

2018-02-21 Thread Martin J. Dürst via Unicode
On 2018/02/17 08:25, James Kass via Unicode wrote: Some people studying Han characters use the IDCs to illustrate the ideographs and their components for various purposes. Well, as far as I understand, this was their original (and is still their main) purpose. For example: U-0002A8B8 𪢸 ⿰土

Re: Why so much emoji nonsense?

2018-02-14 Thread Martin J. Dürst via Unicode
On 2018/02/15 10:49, James Kass via Unicode wrote: Yes, except that Unicode "supported" all manner of things being interchanged by setting aside a range of code points for private use. Which enabled certain cell phone companies to save some bandwidth by assigning various popular in-line graphics

Re: Keyboard layouts and CLDR

2018-01-30 Thread Martin J. Dürst via Unicode
On 2018/01/30 16:18, Philippe Verdy via Unicode wrote: - Adding Y to the list of allowed letters after the dieresis deadkey to produce "Ÿ" : the most frequent case is L'HAŸE-LÈS-ROSES, the official name of a French municipality when written with full capitalisation, almost all spell checkers o

Re: 0027, 02BC, 2019, or a new character?

2018-01-22 Thread Martin J. Dürst via Unicode
On 2018/01/23 09:55, James Kass via Unicode wrote: Any Kazakh/Qazaq student ambitious enough to study a foreign language such as English is already sophisticated enough to easily distinguish differing digraph values between the two languages. English speakers face distinctions such as the diffe

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-21 Thread Martin J. Dürst via Unicode
On 2017/12/15 07:40, Richard Wordingham via Unicode wrote: On Mon, 11 Dec 2017 21:45:23 + Cibu Johny (സിബു) wrote: Malayalam could be a similar story. In case of Malayalam, it can be font specific because of the existence of traditional and reformed writing styles. A conjunct might be a

Re: Word_Break for Hieroglyphs

2017-12-20 Thread Martin J. Dürst via Unicode
On 2017/12/20 17:46, Richard Wordingham via Unicode wrote: In an implementation that offered genuine whole word selection, and thus tackled with the challenges of Chinese, Japanese, Korean and Vietnamese (both scripts, not just CJKV) as well as Thai, I would expect the selections to be bounded b

Interesting UTF-8 decoder

2017-10-09 Thread Martin J. Dürst via Unicode
A friend of mine sent me a pointer to http://nullprogram.com/blog/2017/10/06/, a branchless UTF-8 decoder. Regards, Martin.

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Martin J. Dürst via Unicode
On 2017/09/26 22:03, John W Kennedy via Unicode wrote: I don’t know what your snippet is from, but the normally authoritative IBM manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is clearly the Cyrillic letter. Whether it should be regarded as that, or as a distinct characte

Re: Assamese and Unicode.

2017-09-05 Thread Martin J. Dürst via Unicode
Sorry for the long delay of this answer. On 2017/08/24 07:35, David Faulks via Unicode wrote: It appears that the Indian government will submit an 'Assamese' proposal. http://silchar.com/unicode-standard-for-assamese-in-the-offing/ Since everything I know about Assamese Script indicates that i

Inadvertent copies of test data in L2/17-197 ?

2017-08-06 Thread Martin J. Dürst via Unicode
Hello Henry, I just had a look at http://www.unicode.org/L2/L2017/17197-utf8-retract.pdf to use the test data in there for Ruby. I was under the impression from previous looks at it that it contained a lot of test data. However, when I looked at the test data more carefully (I had read the

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-08-05 Thread Martin J. Dürst via Unicode
Hello Mark, On 2017/08/04 09:34, Mark Davis ☕️ wrote: FYI, the UTC retracted the following. Thanks for letting us know! Regards, Martin. *[151-C19 ] Consensus:* Modify the section on "Best Practices for Using FFFD" in section "3.9 Encodi

Re: Problems with BidiCharTest.txt

2017-07-16 Thread Martin J. Dürst via Unicode
On 2017/07/17 04:28, Philippe Verdy via Unicode wrote: The isolation mode is also the one strongly recommended by default for elements in HTML, Well, that's for sure, because the "i" in "bdi" stands for "isolation", and the element was newly created for the isolation mode. Regards, Marti

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-04 Thread Martin J. Dürst via Unicode
On 2017/06/02 04:54, Doug Ewell via Unicode wrote: Richard Wordingham wrote: even supporting 6-byte patterns just in case 20.1 bits eventually turn out not to be enough, Sorry to be late with this, but if 20.1 bits turn out to not be enough, what about 21 bits? That would still limit UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Martin J. Dürst via Unicode
Hello Karl, others, On 2017/05/27 06:15, Karl Williamson via Unicode wrote: On 05/26/2017 12:22 PM, Ken Whistler wrote: On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote: The link provided about the PRI doesn't lead to the comments. PRI #121 (August, 2008) pre-dated the practice of

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Martin J. Dürst via Unicode
Hello Markus, others, On 2017/05/27 00:41, Markus Scherer wrote: On Fri, May 26, 2017 at 3:28 AM, Martin J. Dürst wrote: But there's plenty in the text that makes it absolutely clear that some things cannot be included. In particular, it says The term “maximal subpart of an ill-formed sub

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-26 Thread Martin J. Dürst via Unicode
On 2017/05/25 09:22, Markus Scherer wrote: On Wed, May 24, 2017 at 3:56 PM, Karl Williamson wrote: On 05/24/2017 12:46 AM, Martin J. Dürst wrote: That's wrong. There was a public review issue with various options and with feedback, and the recommendation has been implemented and in use widel

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Martin J. Dürst via Unicode
On 2017/05/24 05:57, Karl Williamson via Unicode wrote: On 05/23/2017 12:20 PM, Asmus Freytag (c) via Unicode wrote: Adding a "recommendation" this late in the game is just bad standards policy. Unless I misunderstand, you are missing the point. There is already a recommendation listed in

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Martin J. Dürst via Unicode
Hello Mark, On 2017/05/22 01:37, Mark Davis ☕️ via Unicode wrote: I actually didn't see any of this discussion until today. Many thanks for chiming in. ( unicode@unicode.org mail was going into my spam folder...) I started reading the thread, but it looks like a lot of it is OT, As is quit

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Martin J. Dürst via Unicode
Hello everybody, [using this mail to in effect reply to different mails in the thread] On 2017/05/16 17:31, Henri Sivonen via Unicode wrote: On Tue, May 16, 2017 at 10:22 AM, Asmus Freytag wrote: Under what circumstance would it matter how many U+FFFDs you see? Maybe it doesn't, but I don

Re: Proposal to add standardized variation sequences for chess notation

2017-04-11 Thread Martin J. Dürst via Unicode
On 2017/04/12 00:44, Philippe Verdy via Unicode wrote: Some Asian chess boards include also diagonal lines or dots on top of their crossing (notably 9x9 boards are subdivided into nine 3x3 subgroups by such dots). These chess boards do not alternate white and black "squares" ; beside this, the c

Re: Unicode vs. Unikod

2017-04-10 Thread Martin J. Dürst via Unicode
Hello Janusz, I think you should report this problem to http://www.unicode.org/reporting.html. That way, it gets tracked appropriately. This list is for discussion, not for bug fixes. Regards, Martin. On 2017/04/10 18:54, Janusz S. Bień wrote: This is a long overdue issue, but better lat