Re: UAX #29 and WB4

2020-03-04 Thread Mark Davis ☕️ via Unicode
; subsequent rules > > are applied we end up in WB999 and a break between 200D and 1F6D1. > > That's nonsense and not the operational model of the algorithm which IIRC > was once clearly stated on this list by Mark Davis (sorry I failed to dig > out the message) which is to ta

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Mark Davis ☕️ via Unicode
I don't think there is a technical reason for disallowing variation selectors after any starters (ccc=000); the normalization algorithm doesn't care about the general category of characters. Mark On Sun, Feb 2, 2020 at 10:09 AM Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On S

Re: Call for feedback on UTS #18: Unicode Regular Expressions

2020-01-02 Thread Mark Davis ☕️ via Unicode
The line just above that is: Name matching rules follow Matching Rules from [UAX44#UAX44-LM2 ]. The deletion was based on feedback that the deleted text was a recap of the above line, but a

Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Mark Davis ☕️ via Unicode
Filed the following, thanks Richard. CLDR-13445 Release link for "latest" goes to zip file On Tue, Dec 3, 2019 at 2:31 AM Richard

Re: Pure Regular Expression Engines and Literal Clusters

2019-10-13 Thread Mark Davis ☕️ via Unicode
The problem is that most regex engines are not written to handle some "interesting" features of canonical equivalence, like discontinuity. Suppose that X is canonically equivalent to AB. - A query /X/ can match the separated A and C in the target string "AbC". So if I have code do [replace /

Re: Pure Regular Expression Engines and Literal Clusters

2019-10-11 Thread Mark Davis ☕️ via Unicode
> > You claimed the order of alternatives mattered. That is an important > issue for anyone rash enough to think that the standard is fit to be > used as a specification. > Regex engines differ in how they handle the interpretation of the matching of alternatives, and it is not possible for us to

Unicode website glitches. (was The Most Frequent Emoji)

2019-10-11 Thread Mark Davis ☕️ via Unicode
t of > the Unicode v12.0 emoji ranked in order of how frequently they are used. > > “The forecasted frequency of use is a key factor in determining whether > to encode new emoji, and for that it is important to know the frequency > of use of existing emoji,” said Mark Davis, President of the

Re: Unicode "no-op" Character?

2019-07-03 Thread Mark Davis ☕️ via Unicode
Your goal is not achievable. We can't wave a magic wand, and suddenly (or even within decades) all processes everywhere ignore U+000F in all processing will not happen. This thread is pointless and should be terminated. Mark On Wed, Jul 3, 2019 at 5:48 PM Sławomir Osipiuk via Unicode < unicode@

Re: Unicode "no-op" Character?

2019-06-22 Thread Mark Davis ☕️ via Unicode
There nothing like what you are describing. Examples: 1. Display — There are a few of the Default Ignorables that are always treated as invisible, and have little effect on other characters. However, even those will generally interfere with the display of sequences (be between 'q' and

Re: Unicode CLDR 35 alpha available for testing

2019-03-05 Thread Mark Davis ☕️ via Unicode
Just via svn checkout for the alpha. By next time we plan to be on GitHub... {phone} On Thu, Feb 28, 2019, 13:07 Doug Ewell via Unicode wrote: > announcements at unicode.org wrote: > > > The alpha version of Unicode CLDR 35 > > is available for

Re: Ancient Greek apostrophe marking elision

2019-01-28 Thread Mark Davis ☕️ via Unicode
: > > On 2019-01-28 7:31 AM, Mark Davis ☕️ via Unicode wrote: > > Expecting people to type in hard-to-find invisible characters just to > > correct double-click is not a realistic expectation. > > True, which is why such entries, when consistent, are properly handled > at the k

Re: Ancient Greek apostrophe marking elision

2019-01-28 Thread Mark Davis ☕️ via Unicode
gt; On Mon, Jan 28, 2019 at 2:31 AM Mark Davis ☕️ wrote: > >> But the question is how important those are in daily life. I'm not sure >> why the double-click selection behavior is so much more of a problem for >> Ancient Greek users than it is for the somewhat larger communit

Re: Ancient Greek apostrophe marking elision

2019-01-27 Thread Mark Davis ☕️ via Unicode
Note that this is no different than the reasonably common cases in English such as «the boys’ books». (you can try various combinations in http://unicode.org/cldr/utility/list-unicodeset.jsp) There are certainly cases that are suboptimal in word selection. As another example, «re-iterate» seems li

Re: Ancient Greek apostrophe marking elision

2019-01-26 Thread Mark Davis ☕️ via Unicode
meant for). > > For normal edition operations, breaking selection for "d'Artagnan" or > "can't" into two is overly fussy. > > No wonder people get frustrated. > > A./ > > James > > On Fri, Jan 25, 2019 at 12:34 PM Mark Davis ☕️ wrote: > &g

Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread Mark Davis ☕️ via Unicode
U+2019 is normally the character used, except where the ’ is considered a letter. When it is between letters it doesn't cause a word break, but because it is also a right single quote, at the end of words there is a break. Thus in a phrase like «tryin’ to go» there is a word break after the n, beca

Re: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

2018-12-10 Thread Mark Davis ☕️ via Unicode
I tend to agree with your analysis that emitting U+FFFD when there is no content between escapes in "shifting" encodings like ISO-2022-JP is unnecessary, and for consistency between implementations should not be recommended. Can you file this at http://www.unicode.org/reporting.html so that the co

Re: Unicode String Models

2018-11-22 Thread Mark Davis ☕️ via Unicode
Thanks for the review! In case you're interested, I'd also welcome feedback on Locale Identifiers <https://goo.gl/kizkrm> Mark On Thu, Nov 22, 2018 at 11:27 AM Henri Sivonen wrote: > On Tue, Oct 2, 2018 at 3:04 PM Mark Davis ☕️ wrote: > >> >> *

Re: The encoding of the Welsh flag

2018-11-21 Thread Mark Davis ☕️ via Unicode
We have gotten requests for this, but the stumbling block is the lack of an official N. Ireland document describing what the official flag is and should look like. “However, whilst England (St George’s Cross) Scotland (St Andrew’s Cross) and Wales (The Dragon) have individual regional flags, the

Re: UCA unnecessary collation weight 0000

2018-11-04 Thread Mark Davis ☕️ via Unicode
Philippe, I agree that we could have structured the UCA differently. It does make sense, for example, to have the weights be simply decimal values instead of integers. But nobody is going to go through the substantial work of restructuring the UCA spec and data file unless there is a very strong re

Re: UCA unnecessary collation weight 0000

2018-11-02 Thread Mark Davis ☕️ via Unicode
dard makes the presence of required in some steps, and the > requirement is in fact wrong: this is in fact NEVER required to create an > equivalent collation order. these steps are completely unnecessary and > should be removed. > > Le ven. 2 nov. 2018 à 14:03, Mark Davis ☕️ a é

Re: UCA unnecessary collation weight 0000

2018-11-02 Thread Mark Davis ☕️ via Unicode
You may not like the format of the data, but you are not bound to it. If you don't like the data format (eg you want [.0021.0002] instead of [..0021.0002]), you can transform it however you want as long as you get the same answer, as it says here: http://unicode.org/reports/tr10/#Conformance “

Re: Unicode String Models

2018-10-03 Thread Mark Davis ☕️ via Unicode
Mark On Wed, Oct 3, 2018 at 3:01 PM Daniel Bünzli wrote: > On 3 October 2018 at 09:17:10, Mark Davis ☕️ via Unicode ( > unicode@unicode.org) wrote: > > > There are two main choices for a scalar-value API: > > > > 1. Guarantee that the storage never contains surrogate

Re: Unicode String Models

2018-10-03 Thread Mark Davis ☕️ via Unicode
Mark On Tue, Oct 2, 2018 at 8:31 PM Daniel Bünzli wrote: > On 2 October 2018 at 14:03:48, Mark Davis ☕️ via Unicode ( > unicode@unicode.org) wrote: > > > Because of performance and storage consideration, you need to consider > the > > possible internal data structures

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
. Bień wrote: > On Sat, Sep 08 2018 at 18:36 +0200, Mark Davis ☕️ via Unicode wrote: > > I recently did some extensive revisions of a paper on Unicode string > models (APIs). Comments are welcome. > > > > > https://docs.google.com/document/d/1wuzzMOvKOJw93SWZAq

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Mark On Tue, Sep 11, 2018 at 12:17 PM Henri Sivonen via Unicode < unicode@unicode.org> wrote: > On Sat, Sep 8, 2018 at 7:36 PM Mark Davis ☕️ via Unicode > wrote: > > > > I recently did some extensive revisions of a paper on Unicode string > models (APIs). Comments a

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Mark On Sun, Sep 9, 2018 at 3:42 PM Daniel Bünzli wrote: > Hello, > > I find your notion of "model" and presentation a bit confusing since it > conflates what I would call the internal representation and the API. > > The internal representation defines how the Unicode text is stored and > shoul

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Implementation. In addition, the > design notes at <https://github.com/larcenists/larceny/wiki/ImmutableTexts>, > though not up to date (in particular, UTF-16 internals are now allowed as > an alternative to UTF-8), are of interest: unfortunately, the link to the > span AP

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Mark On Sun, Sep 9, 2018 at 10:03 AM Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Sat, 8 Sep 2018 18:36:00 +0200 > Mark Davis ☕️ via Unicode wrote: > > > I recently did some extensive revisions of a paper on Unicode string > > models (

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Thanks to all for comments. Just revised the text in https://goo.gl/neguxb. Mark On Sat, Sep 8, 2018 at 6:36 PM Mark Davis ☕️ wrote: > I recently did some extensive revisions of a paper on Unicode string > models (APIs). Comments are welcome. > > > https://docs.google

Re: Unicode String Models

2018-09-11 Thread Mark Davis ☕️ via Unicode
These are all interesting and useful comments. I'll be responding once I get a bit of free time, probably Friday or Saturday. Mark On Tue, Sep 11, 2018 at 4:16 AM Eli Zaretskii via Unicode < unicode@unicode.org> wrote: > > Date: Tue, 11 Sep 2018 13:12:40 +0300 > > From: Henri Sivonen via Unicod

Re: Unicode String Models

2018-09-09 Thread Mark Davis ☕️ via Unicode
fortunately, the link to the > span API has rotted. > > On Sat, Sep 8, 2018 at 12:53 PM Mark Davis ☕️ via Unicore < > unic...@unicode.org> wrote: > >> I recently did some extensive revisions of a paper on Unicode string >> models (APIs). Comments are welcome. >> >> >> https://docs.google.com/document/d/1wuzzMOvKOJw93SWZAqoim1VUl9mloUxE0W6Ki_G23tw/edit# >> >> Mark >> >

Unicode String Models

2018-09-08 Thread Mark Davis ☕️ via Unicode
I recently did some extensive revisions of a paper on Unicode string models (APIs). Comments are welcome. https://docs.google.com/document/d/1wuzzMOvKOJw93SWZAqoim1VUl9mloUxE0W6Ki_G23tw/edit# Mark

Re: Private Use areas (was: Re: Thoughts on working with the Emoji Subcommittee (was ...))

2018-08-20 Thread Mark Davis ☕️ via Unicode
> ... some people who would call a PUA solution either batty > or crazy. I don't think it is either batty or crazy. People can certainly use the PUA to interchange text (assuming that they have downloaded fonts and keyboards or some other input method beforehand), and it can definitely serve as a

Re: Tales from the Archives

2018-08-19 Thread Mark Davis ☕️ via Unicode
You and Alan both raise good issues and make good points. I'd mention a couple of others. When we started Unicode, there were not a lot of alternatives to a general-purpose discussion email list for internationalization, but now there are many. Often the technical discussions are moved to more spe

Re: Usage of emoji in coding contexts!

2018-08-09 Thread Mark Davis ☕️ via Unicode
Very amusing. But interesting how it catches your eye when scanning a list. Mark On Thu, Aug 9, 2018 at 7:37 AM, Shriramana Sharma via Unicode < unicode@unicode.org> wrote: > First time I'm seeing this (maybe others have seen this already): > > https://github.com/wei/pull > > Emoji being used in

Re: Diacritic marks in parentheses

2018-07-26 Thread Mark Davis ☕️ via Unicode
But Asmus, think of how easy it would be to read: Ein⁽ᵉ⁾ A⁽¨⁾rzt⁽ⁱⁿ⁾ hat eine⁽ⁿ⁾ Studenti⁽ᵉ⁾n gesehen. Mark On Thu, Jul 26, 2018 at 2:15 PM, Mark Davis ☕️ wrote: > 🤣 > > Mark > > On Thu, Jul 26, 2018 at 1:57 PM, Asmus Freytag via Unicode < > unicode@unicode.org

Re: Diacritic marks in parentheses

2018-07-26 Thread Mark Davis ☕️ via Unicode
🤣 Mark On Thu, Jul 26, 2018 at 1:57 PM, Asmus Freytag via Unicode < unicode@unicode.org> wrote: > On 7/26/2018 9:27 AM, Markus Scherer via Unicode wrote: > > I would not expect for Ä+combining () above = Ä᪻ to look right except with > specialized fonts. > http://demo.icu-project.org/icu-bin/nbro

Re: Missing UAX#31 tests?

2018-07-14 Thread Mark Davis ☕️ via Unicode
Not to worry, these things happen to the best of us. Just glad the root of the problem was found. Mark Mark On Sat, Jul 14, 2018 at 5:51 PM, Karl Williamson wrote: > On 07/09/2018 02:11 PM, Karl Williamson via Unicode wrote: > >> On 07/08/2018 03:21 AM, Mark Davis ☕️ wrote

Re: Handling emoji

2018-07-14 Thread Mark Davis ☕️ via Unicode
Just fixed the one you found, Philippe... Mark On Sat, Jul 14, 2018 at 2:51 PM, Mark Davis ☕️ wrote: > Thanks for the feedback, Philippe. > > I haven't fixed that one yet, but added some more text (thanks to Ben > Hamilton!) and an acknowledgments section. > > > >

Re: Handling emoji

2018-07-14 Thread Mark Davis ☕️ via Unicode
o-emoji");*But I'm > not sure of their order (which one of the two defined (named) locales is > locale1 or locale2. > > Philippe. > > 2018-07-13 20:33 GMT+02:00 Mark Davis ☕️ via Unicode > : > >> Put together a doc about this; suggestions for improvement are welcome. >> >> https://docs.google.com/document/d/1pC7N32TnmDr2xzFW4HscA1Dy >> APPZnwILUH2_03UL6Jo/preview >> >> Mark >> > >

Handling emoji

2018-07-13 Thread Mark Davis ☕️ via Unicode
Put together a doc about this; suggestions for improvement are welcome. https://docs.google.com/document/d/1pC7N32TnmDr2xzFW4HscA1DyAPPZnwILUH2_03UL6Jo/preview Mark

Re: Missing UAX#31 tests?

2018-07-09 Thread Mark Davis ☕️ via Unicode
Thanks, Karl. Mark On Mon, Jul 9, 2018 at 10:11 PM, Karl Williamson wrote: > On 07/08/2018 03:21 AM, Mark Davis ☕️ wrote: > >> I'm surprised that the tests for 11.0 passed for a 10.0 implementation, >> because the following should have triggered a difference for WB. C

Re: Missing UAX#31 tests?

2018-07-08 Thread Mark Davis ☕️ via Unicode
PS, although the title was "Missing UAX#31 tests?", I assumed you were talking about http://unicode.org/reports/tr29/ Mark On Sun, Jul 8, 2018 at 11:21 AM, Mark Davis ☕️ wrote: > I'm surprised that the tests for 11.0 passed for a 10.0 implementation, > because the

Re: Missing UAX#31 tests?

2018-07-08 Thread Mark Davis ☕️ via Unicode
I'm surprised that the tests for 11.0 passed for a 10.0 implementation, because the following should have triggered a difference for WB. Can you check on this particular case? ÷ 0020 × 0020 ÷ # ÷ [0.2] SPACE (WSegSpace) × [3.4] SPACE (WSegSpace) ÷ [0.3] About the testing: The tests are generate

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-13 Thread Mark Davis ☕️ via Unicode
> That is, why is conforming to UAX #31 worth the risk of prohibiting the use of characters that some users might want to use? One could parse for certain sequences, putting characters into a number of broad categories. Very approximately: - junk ~= [[:cn:][:cs:][:co:]]+ - whitespace ~= [[:

Re: The Unicode Standard and ISO

2018-06-12 Thread Mark Davis ☕️ via Unicode
Steven wrote: > I usually recommend creating a new project first... That is often a viable approach. But proponents shouldn't get the wrong impression. I think the chance of anything resembling the "localized sentences" / "international message components" have zero chance of being adopted by U

Re: UTS#51 and emoji-sequences.txt

2018-06-09 Thread Mark Davis ☕️ via Unicode
Thanks, it definitely looks like there are some mismatches in terminology there. Can you please file this with the reporting form on the unicode site? {phone} On Sat, Jun 9, 2018, 05:00 Yifán Wáng via Unicode wrote: > When I'm looking at > https://unicode.org/Public/emoji/11.0/emoji-sequences.t

Re: The Unicode Standard and ISO

2018-06-08 Thread Mark Davis ☕️ via Unicode
Mark On Fri, Jun 8, 2018 at 10:06 AM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Fri, 8 Jun 2018 05:32:51 +0200 (CEST) > Marcel Schneider via Unicode wrote: > > > Thank you for confirming. All witnesses concur to invalidate the > > statement about uniqueness of ISO/IEC 106

Re: The Unicode Standard and ISO

2018-06-08 Thread Mark Davis ☕️ via Unicode
Where are you getting your "facts"? Among many unsubstantiated or ambiguous claims in that very long sentence: 1. "French locale in CLDR is still surprisingly incomplete". 1. For each release, the data collected for the French locale is complete to the bar we have set for Level=Mode

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Mark Davis ☕️ via Unicode
Got it, thanks. Mark On Thu, Jun 7, 2018 at 3:29 PM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Thu, 7 Jun 2018 10:42:46 +0200 > Mark Davis ☕️ via Unicode wrote: > > > > The proposal also asks for identifiers to be treated as equivalent > >

Re: The Unicode Standard and ISO

2018-06-07 Thread Mark Davis ☕️ via Unicode
A few facts. > ... Consortium refused till now to synchronize UCA and ISO/IEC 14651. ISO/IEC 14651 and Unicode have longstanding cooperation. Ken Whistler could speak to the synchronization level in more detail, but the above statement is inaccurate. > ... For another part it [sync with ISO/IEC 

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Mark Davis ☕️ via Unicode
> The proposal also asks for identifiers to be treated as equivalent under NFKC. The guidance in #31 may not be clear. It is not to replace identifiers as typed in by the user by their NFKC equivalent. It is rather to internally *identify* two identifiers (as typed in by the user) as being the sam

Re: Submissions open for 2020 Emoji

2018-04-20 Thread Mark Davis ☕️ via Unicode
ce we have added emoji names to cldr.) Mark On Thu, Apr 19, 2018 at 2:32 PM, Mark Davis ☕️ wrote: > > imagine I discover that someone has already proposed the emoji that I > am interested in > > In some cases we've have contacted people to see if they want to engage > with othe

Re: Submissions open for 2020 Emoji

2018-04-20 Thread Mark Davis ☕️ via Unicode
> On 4/19/2018 9:36 AM, Mark Davis ☕️ wrote: > > The UTC didn't want to burden the doc registry with all the emoji > proposals. > > > The question of whether the registry should be divided is independent on > whether proposals are public or private in nature. > >

Re: Submissions open for 2020 Emoji

2018-04-19 Thread Mark Davis ☕️ via Unicode
The UTC didn't want to burden the doc registry with all the emoji proposals. Mark On Thu, Apr 19, 2018 at 6:22 PM, Asmus Freytag via Unicode < unicode@unicode.org> wrote: > On 4/19/2018 5:32 AM, Mark Davis ☕️ via Unicode wrote: > > > imagine I discover that someone h

Re: Submissions open for 2020 Emoji

2018-04-19 Thread Mark Davis ☕️ via Unicode
> imagine I discover that someone has already proposed the emoji that I am interested in In some cases we've have contacted people to see if they want to engage with other proposers. But to handle larger numbers we'd need a simple, light-weight way to let people know, while maintaining people's pr

Unicode Utilities

2018-03-23 Thread Mark Davis ☕️ via Unicode
For testing, the Unicode Utilities now support the Unicode beta properties (with some caveats). Example: \p{gcβ=Lu}-\p{gc=Lu} . Thanks to Sascha for helping to move to different infrastructure

Re: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones

2018-03-17 Thread Mark Davis ☕️ via Unicode
ommend for > me to look? > > > > Kindest Regards, > > > > Ed Borgquist > > .WS Registry > > > > *From:* mark.edward.da...@gmail.com [mailto:mark.edward.da...@gmail.com] *On > Behalf Of *Mark Davis ?? > *Sent:* Saturday, March 17, 2018 5:20 AM > *To

Re: Full Emoji List Chart No Longer Displaying Emoji with Skin-tones

2018-03-17 Thread Mark Davis ☕️ via Unicode
We were getting so much traffic on the emoji pages that we had to produce an abbreviated version to reduce the load (without skin tones, it is about half the size). We are looking at improvements to the infrastructure and/or chart design that would let us restore them, but people are busy with oth

Re: A sketch with the best-known Swiss tongue twister

2018-03-09 Thread Mark Davis ☕️ via Unicode
emannic group ( "gsw" , "gsw-FR", "gsw-CH"), possibly extended > (this is discutable) to Schwäbish in Germany and Hungary. > > My opinion is that even the Swiss variants should be preferably named > "Swiss Alemannic" collectively, and not "Swiss G

Re: A sketch with the best-known Swiss tongue twister

2018-03-09 Thread Mark Davis ☕️ via Unicode
Yes, the right English names are "Swiss High German" for de-CH, and "Swiss German" for gsw-CH. Mark On Fri, Mar 9, 2018 at 2:40 PM, Tom Gewecke via Unicode wrote: > > > On Mar 9, 2018, at 5:52 AM, Philippe Verdy via Unicode < > unicode@unicode.org> wrote: > > > > So the "best-known Swiss tongue

Re: A sketch with the best-known Swiss tongue twister

2018-03-09 Thread Mark Davis ☕️ via Unicode
9, 2018 at 12:52 PM, Philippe Verdy wrote: > Is that just for Switzerland in one of the local dialectal variants ? Or > more generally Alemannic (also in Northeastern France, South Germany, > Western Austria, Liechtenstein, Northern Italy). > > 2018-03-09 12:09 GMT+01:00 Mark D

A sketch with the best-known Swiss tongue twister

2018-03-09 Thread Mark Davis ☕️ via Unicode
https://www.youtube.com/watch?v=QOwITNazUKg De Papscht hät z’Schpiäz s’Schpäkchbschtekch z’schpaat bschtellt. literally: The Pope has [in Spiez] [the bacon cutlery] [too late] ordered. Mark

Re: Sentence_Break, Semi-colons, and Apparent Miscategorization

2018-03-08 Thread Mark Davis ☕️ via Unicode
>From the first line, I guess you mean that all three questions are having to do with the Sentence_Break property values. Namely: http://www.unicode.org/reports/tr29/proposed.html#Table_Sentence_Break_Property_Values http://www.unicode.org/reports/tr29/proposed.html#SContinue Mark On Thu, Mar 8,

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-02 Thread Mark Davis ☕️ via Unicode
No, the patterns should always have the right format. However, in the supplemental data there is information as to the preferred data for each language. This data isn't collected through the ST, so a ticket needs to be filed. In your particular case, the data has: If DE just doesn't use hB, the

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-02 Thread Mark Davis ☕️ via Unicode
Right, Doug. I'll say a few more words. In terms of language support, encoding of new characters in Unicode benefits mostly digital heritage languages (via representation of historic languages in Unicode, enabling preservation and scholarly work), although there are some modern-use cases like Hani

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Mark Davis ☕️ via Unicode
I'm more interested in what areas you found unclear, because wherever you did I'm sure many others would as well. You can reply off-list if you want. Mark Mark On Wed, Feb 28, 2018 at 12:22 PM, Janusz S. Bień wrote: > > Thanks to all who answered. The answers are very clear, but the original >

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Mark Davis ☕️ via Unicode
Also, please click through from the announcement to http://www.unicode.org/consortium/adopt-a-character.html. If it isn't apparent from that page what the relationship is, we have some work to do... Mark On Wed, Feb 28, 2018 at 11:48 AM, Martin J. Dürst via Unicode < unicode@unicode.org> wrote:

Re: Why so much emoji nonsense?

2018-02-16 Thread Mark Davis ☕️ via Unicode
A few points 1. To add to what Asmus said, see also http://unicode.org/L2/L2018/18044-encoding-emoji.pdf "Their encoding, surprisingly, has been a boon for language support. The emoji draw on Unicode mechanisms that are used by various languages, but which had been incompletely implemented on man

Re: Keyboard layouts and CLDR (was: Re: 0027, 02BC, 2019, or a new character?)

2018-01-28 Thread Mark Davis ☕️ via Unicode
On Sun, Jan 28, 2018 at 3:20 PM, Doug Ewell wrote: > Mark Davis wrote: > > One addition: with the expansion of keyboards in >> http://blog.unicode.org/2018/01/unicode-ldml-keyboard-enhancements.html >> we are looking to expand the repository to not merely represent those, &g

Re: Keyboard layouts and CLDR (was: Re: 0027, 02BC, 2019, or a new character?)

2018-01-28 Thread Mark Davis ☕️ via Unicode
One addition: with the expansion of keyboards in http://blog.unicode.org/2018/01/unicode-ldml-keyboard-enhancements.html we are looking to expand the repository to not merely represent those, but to also serve as a resource that vendors can draw on. Mark On Sun, Jan 28, 2018 at 1:11 PM, Doug Ewel

Re: [HUMOR] Proof that emojis are useful

2018-01-27 Thread Mark Davis ☕️ via Unicode
Nice, thanks! Mark On Sat, Jan 27, 2018 at 7:31 AM, Stephane Bortzmeyer via Unicode < unicode@unicode.org> wrote: > Nice scientific info, and with emojis : > > https://twitter.com/biolojical/status/956953421130514432 >

Re: 0027, 02BC, 2019, or a new character?

2018-01-25 Thread Mark Davis ☕️ via Unicode
My apologies for the typo. There's no excuse for misspelling someone's name (especially since I live in Switzerland, and type German every day). Thanks for calling my attention to it: the doc has been updated. Mark Mark On Thu, Jan 25, 2018 at 4:15 AM, Andrew West via Unicode < unicode@unicode.

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2018-01-22 Thread Mark Davis ☕️ via Unicode
Good point, thanks Mark On Mon, Jan 22, 2018 at 6:41 PM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Sun, 21 Jan 2018 22:34:12 -0800 > Mark Davis ☕️ via Unicode wrote: > > > The ZWJ Virama sequence is already provided for by the combination of > &

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2018-01-21 Thread Mark Davis ☕️ via Unicode
perhaps this rule should be something like > > > > (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant > > > > -Manish > > > > On Sat, Dec 9, 2017 at 7:16 AM, Mark Davis ☕️ via Unicode < > > unicode@unicode.org> wrote: > > > > > 1. Yo

Re: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))

2018-01-05 Thread Mark Davis ☕️ via Unicode
Doug, I modified my working draft, at https://docs.google.com/document/d/1EuNjbs0XrBwqlvCJxra44o3EVrwdBJUWsPf8Ec1fWKY If that looks ok, I'll submit. Thanks again for your comments. Mark Mark On Wed, Jan 3, 2018 at 9:29 AM, Mark Davis ☕️ wrote: > Thanks for your comments; you

Re: Regex for Grapheme Cluster Breaks

2018-01-03 Thread Mark Davis ☕️ via Unicode
Quick update: Manish pointed out that I'd misstated one of the rules, should be: skin-sequence = $E_Base $Extend* $E_Modifier ; ​With that change, the test passes. (Thanks Manish!)​ Mark On Wed, Jan 3, 2018 at 10:16 AM, Mark Davis ☕️ wrote: > I had a UTC action to adj

Regex for Grapheme Cluster Breaks

2018-01-03 Thread Mark Davis ☕️ via Unicode
I had a UTC action to adjust http://www.unicode.org/reports/tr29/proposed.html#Table_Combining_Char_Sequences_and_Grapheme_Clusters to update the regex, and other necessary changes surrounding text. Here is what I've come up with for an EBNF formulation. The $x are the GCB properties. cluster = c

Re: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))

2018-01-03 Thread Mark Davis ☕️ via Unicode
ested based on properties. Mark On Tue, Jan 2, 2018 at 9:55 PM, Doug Ewell via Unicode wrote: > Mark Davis wrote: > > BTW, relevant to this discussion is a proposal filed >> http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The >> date is wrong, should be 2017-12-

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-02 Thread Mark Davis ☕️ via Unicode
BTW, relevant to this discussion is a proposal filed http://www.unicode.org/ L2/L2017/17434-emoji-rejex-uts51-def.pdf (The date is wrong, should be 2017-12-22) Mark On Tue, Jan 2, 2018 at 11:41 AM, Mark Davis ☕️ wrote: > We had that originally, but some people objected that some langua

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-02 Thread Mark Davis ☕️ via Unicode
uff). Good to know! >> >> >> > Instead, we'd add one line to >> *Extend <http://www.unicode.org/reports/tr29/tr29-32.html#Extend>:* >> >> Yeah, this is essentially what I was hoping we could do. >> >> Is there any way to formally propose t

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-02 Thread Mark Davis ☕️ via Unicode
we could do. > > Is there any way to formally propose this? Or is bringing it up here good > enough? > > Thanks, > > -Manish > > On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ☕️ via Unicode < > unicode@unicode.org> wrote: > >> This is an interesting sugge

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-01 Thread Mark Davis ☕️ via Unicode
This is an interesting suggestion, Manish. is a degenerate case, so if we following your suggestion we also could drop E_Base and E_Modifier, and rule GB10. Instead, we'd add one line to *Extend :* OLD Grapheme_Extend = Yes *and not* GCB

Re: Possible bug in formal grammar for extended grapheme cluster

2017-12-18 Thread Mark Davis ☕️ via Unicode
; unicode@unicode.org> wrote: > Ah! That explains why > > pcre2grep -u '^\X{1}$' > > matches with > > 🇬🇧 > 🇩🇪🇫🇷 > 🇨🇳🇮🇹🇲🇾 > 🇪🇸🇦🇺🇷🇺🇳🇱🇯🇵 > > ...etc... > > André Schappo > > On 17 Dec 2017, at 17:17, Mark Davis ☕️ via Un

Re: Possible bug in formal grammar for extended grapheme cluster

2017-12-17 Thread Mark Davis ☕️ via Unicode
Thanks for the feedback. You're correct about this; that is a holdover from an earlier version of the document when there was a more basic treatment of RI sequences. There is already an action to modify these. There is a placeholder review note about that just above http://www.unicode.org/reports

Re: Word_Break for Hieroglyphs

2017-12-14 Thread Mark Davis ☕️ via Unicode
Mark <https://twitter.com/mark_e_davis> On Thu, Dec 14, 2017 at 3:22 PM, Michael Everson wrote: > On 14 Dec 2017, at 14:14, Mark Davis ☕️ via Unicode > wrote: > > > The Word_Break property doesn't have a value Complex_Context, but I > think that was just a typo

Re: Word_Break for Hieroglyphs

2017-12-14 Thread Mark Davis ☕️ via Unicode
The Word_Break property doesn't have a value Complex_Context, but I think that was just a typo in your message. The word break and line break properties for 1,057 [:Script=Egyp:] characters are currently Word_Break=ALetter Line_Break=Alphabetic Off the top of my head, I think the best course wou

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-11 Thread Mark Davis ☕️ via Unicode
ps://twitter.com/mark_e_davis> On Sat, Dec 9, 2017 at 9:30 PM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Sat, 9 Dec 2017 16:16:44 +0100 > Mark Davis ☕️ via Unicode wrote: > > > 1. You make a good point about the GB9c. It should probably instead b

Re: Aquaφοβία

2017-12-09 Thread Mark Davis ☕️ via Unicode
Some people have been confused by the previous wording, and thought that it wouldn't be legitimate to break on script boundaries. So we wanted to make it clear that that was possible, since: 1. Many implementations of rendering break text into script-runs before further processing, and 2.

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-09 Thread Mark Davis ☕️ via Unicode
1. You make a good point about the GB9c. It should probably instead be something like: GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant Extend is a broader than necessary, and there are a few items that have ccc!=0 but not gcb=extend. But all of those look to be degenerate cases. https://unic

Re: ASCII v Unicode

2017-11-05 Thread Mark Davis ☕️ via Unicode
I had some time on the plane this weekend, and generated some more comprehensive figures that take the following into account: 1. There are two senses of "Unicode". In the narrow sense, it is only the Unicode Standard (ie, Unicode Characters). But it has grown to have a more comprehensive

Re: Interesting UTF-8 decoder

2017-10-09 Thread Mark Davis ☕️ via Unicode
The paper points out that the input buffer needs to be padded with 3 null bytes as a precondition. Mark On Mon, Oct 9, 2017 at 10:57 AM, J Decker via Unicode wrote: > that's interesting; however it will segfault if the string ends on a > memory allocation boun

Re: Unicode education in Schools

2017-08-25 Thread Mark Davis ☕️ via Unicode
Mark (https://twitter.com/mark_e_davis) On Thu, Aug 24, 2017 at 11:01 PM, Asmus Freytag via Unicode < unicode@unicode.org> wrote: > On 8/24/2017 10:17 AM, Andre Schappo via Unicode wrote: > >> Because there are many systems that can now handle BMP characters but not >> cannot handle SMP characte

Re: Version linking?

2017-08-17 Thread Mark Davis ☕️ via Unicode
>Intermediate versions can't add any new characters, but can add sequences and properties, including "emojification" of existing characters. E.g. E4.0 didn't reference any characters from U10.0. It did recognize *sequences* of existing U9.0 characters. E5.0 did have the emoji properties of some 10.

Re: Version linking?

2017-08-17 Thread Mark Davis ☕️ via Unicode
​Emoji versions are (currently) on a somewhat faster schedule than Unicode : U10.0 — ​ ​ E5.0, ​E6.0​ ​ (TBD)​ U09.0 — E3.0 ​, E4.0 Intermediate versions can't add any new characters, but can add sequences and properties, including "emojification" of existing characters. ​ {phone} On Aug 17,

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-08-03 Thread Mark Davis ☕️ via Unicode
FYI, the UTC retracted the following. *[151-C19 ] Consensus:* Modify the section on "Best Practices for Using FFFD" in section "3.9 Encoding Forms" of TUS per the recommendation in L2/17-168

Re: Turtle Graphics Emoji

2017-07-28 Thread Mark Davis ☕️ via Unicode
Producing emoji sticker sets and apps requires no involvement of Unicode or any other organization. So you can find out on your own whether there is an audience for your "Turtle Graphics Emoji". Mark (https://twitter.com/mark_e_davis) On Fri, Jul 28, 2017 at 2:22 PM, William_J_G Overington via

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Mark Davis ☕️ via Unicode
> I do not understand the energy being invested in a case that shouldn't happen, especially in a case that is a subset of all the other bad cases that could happen. I think Richard stated the most compelling reason: … The bug you mentioned arose from two different ways of counting the string leng

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-21 Thread Mark Davis ☕️ via Unicode
I actually didn't see any of this discussion until today. ( unicode@unicode.org mail was going into my spam folder...) I started reading the thread, but it looks like a lot of it is OT, so just scanned some of them. A few brief points: 1. There is plenty of time for public comment, since it wa

Re: Standaridized variation sequences for the Desert alphabet?

2017-04-06 Thread Mark Davis ☕️
Mark On Thu, Apr 6, 2017 at 6:11 PM, Michael Everson wrote: > On 6 Apr 2017, at 16:05, Mark Davis ☕️ wrote: > > >> I just get frustrated when everyone including the veterans seems to > forget every bit of precedent that we have for the useful encoding of > characte

  1   2   3   4   5   6   7   8   9   10   >