Re: Standaridized variation sequences for the Desert alphabet?

2017-04-06 Thread Mark Davis ☕️
On Thu, Apr 6, 2017 at 4:07 PM, Michael Everson wrote: > I just get frustrated when everyone including the veterans seems to forget > every bit of precedent that we have for the useful encoding of characters. > ​Nobody's forgetting anything. ​Simply because people disagree with you doesn't mean

Re: Proposal to add standardized variation sequences for chess notation

2017-04-04 Thread Mark Davis ☕️
Amusing at this is, hard to believe that people are spending this much time on an April Fool's posting. I'm looking forward to similar postings on checkers and go pieces. As a matter of fact, one that proposes adding new characters for every possible configuration of a go board would be imaginativ

Re: Unicode Emoji 5.0 characters now final

2017-03-31 Thread Mark Davis ☕️
Ken's observation "…approximately backwards…" is exactly right, and that's the same reason why Markus suggested something along the lines of "interoperable". I don't think we've come up with a pithy category name yet, but I tried different wording on the slides on http://unicode.org/emoji/. See wh

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Mark Davis ☕️
> `150` in UN M.49 which ISO 3166-1 was derived from and is compatible with. CLDR could safely adopt that if needed. No need to "safely adopt". It is already valid: http://www.unicode.org/reports/tr51/proposed.html#flag-emoji-tag-sequences If you follow the links you'll end up at http://unicode

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Mark Davis ☕️
> If I made an open-source emoji font that contained flags for all of the > 5000ish > ISO 3166-2 codes that actually map to one, would I automatically be > considered a > vendor? > Do I need to have to pay 18000(?) dollars a year for full membership > first? (That's peanuts for multi-billion dolla

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark Davis ☕️
Thanks Mark On Tue, Mar 28, 2017 at 1:01 PM, Philippe Verdy wrote: > I just filed the bug in the CLDR contact form. > > 2017-03-28 12:49 GMT+02:00 Mark Davis ☕️ : > >> ​Thanks. Probably best as: >> >> unicode_locale_id = unicode_language_id >>

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Mark Davis ☕️
On Tue, Mar 28, 2017 at 12:39 PM, Martin J. Dürst wrote: ​​ No, your work wouldn't be impossible. It might be quite a bit more > difficult, but not impossible. I have written papers about Han ideographs > and Japanese text processing where I had to create my own fonts (8-bit, > with mostly r

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark Davis ☕️
language_id > (transformed_extensions > unicode_locale_extensions? > | unicode_locale_extensions > transformed_extensions?)?; = unicode_language_id > [transformed_extensions > [unicode_locale_extensions] > / unicode_locale_extensions > [transformed_extensions]] >

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark Davis ☕️
​Good questions.​ On Tue, Mar 28, 2017 at 11:56 AM, Joan Montané wrote: > 1st one: point 4 (Unicode subdivision codes listed in emoji Unicode site) > arises something like chicken-egg problem. Vendors don't easily add new > subdivision-flags (because they aren't recommended), and Unicode doesn't

Re: different version of common/annotations/ja.xml

2017-03-27 Thread Mark Davis ☕️
. >> >> 2017-03-27 19:04 GMT+09:00 Takao Fujiwara > tfuji...@redhat.com>>: >> >> On 03/27/17 18:48, Mark Davis ☕️-san wrote: >> >> By "committed strings", you mean the hiragana phonetic reading? >> >> >>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Mark Davis ☕️
(I'm sure you know this, Philippe, but a reminder for others: as far as the Unicode projects go, discussions on this list have no effect unless they are turned into a submission (UTC or Emoji proposal, CLDR or ICU ticket).) If you see any problems in the CLDR data, please file a ticket at http://u

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Mark Davis ☕️
To add to what Ken and Markus said: like many other identifiers, there are a number of different categories. 1. *Ill-formed: *"$1" 2. *Well-formed, but not valid: *"usx". Is *syntactic* according to http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence, but is not *valid

Re: different version of common/annotations/ja.xml

2017-03-27 Thread Mark Davis ☕️
By "committed strings", you mean the hiragana phonetic reading? Mark On Mon, Mar 27, 2017 at 11:00 AM, Takao Fujiwara wrote: > Hi, > > Do you have any chances to create a different version of ja.xml of the > Japanese emoji annotation? > http://unicode.org/cldr/trac/browser/tags/latest/common/an

Re: Northern Khmer on iPhone

2017-03-02 Thread Mark Davis ☕️
On Thu, Mar 2, 2017 at 10:06 AM, Norbert Lindenberg < unic...@lindenbergsoftware.com> wrote: > http://norbertlindenberg.com/2015/06/installing-fonts-on-ios/index.html ​Thanks for writing that, Norbert. Sounds a tad painful.​ Mark

Re: WAP Pictogram Specification as Emoji Source

2017-02-13 Thread Mark Davis ☕️
Given the status of WAP, I don't think there is any particular need to seek compatibility for it. On the other hand, it — like other sources — can certainly be mined for ideas. For example, the topic of SCUBA has certainly come up, and I suspect one could make a good case for the expected frequency

Re: RFCs go Unicode

2017-02-05 Thread Mark Davis ☕️
That's great news. It will be so much clearer to be able to have examples with the real characters in them, and to be able to acknowledge the work of authors with the real forms of their names. Mark On Sun, Feb 5, 2017 at 4:48 AM, Andre Schappo wrote: > RFC 7997: The Use of Non-ASCII Characters

Re: how would you state requirements involving sorting?

2017-01-24 Thread Mark Davis ☕️
Perhaps suggest something along the following lines. Sorting. Unicode-conformant collation (http://unicode.org/reports/tr10/) must be used when sorting titles. The collation must follow the user's locale, such as using ICU APIs (http://site.icu-project.org/). Mark On Tue, Jan 24, 2017 at 7:43 AM

Re: Misspelling or Miscoding?

2017-01-18 Thread Mark Davis ☕️
We don't have any set terminology for what you're talking about. We've often just used 'misspelling' in a broad sense, which can include visually confusable or identical glyphs. For example, spelling 'of' with an omicron would be one, as well as a word in a complex script with swapped marks. And

Re: Specification of Encoding of Plain Text

2017-01-14 Thread Mark Davis ☕️
Mark On Fri, Jan 13, 2017 at 7:19 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote: > On Fri, 13 Jan 2017 10:38:30 +0100 > Mark Davis ☕️ wrote: > > > On Thu, Jan 12, 2017 at 10:26 PM, Richard Wordingham < > > richard.wording...@ntlworld.com> wrote

Re: Specification of Encoding of Plain Text

2017-01-13 Thread Mark Davis ☕️
d.com> wrote: > On Thu, 12 Jan 2017 21:03:29 +0100 > Mark Davis ☕️ wrote: > > > That was just an example off the top of my head of the format for > > using with regex; I don't pretend that it is vetted. Latin is not a > > complex script, so it was only an illustra

Re: Specification of Encoding of Plain Text

2017-01-12 Thread Mark Davis ☕️
2017 at 7:42 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote: > On Thu, 12 Jan 2017 14:12:09 +0100 > Mark Davis ☕️ wrote: > > > I agree that comprehension is a goal. I'd imagine using a BNF regex, > > like the following. This is simple, since I

Re: Specification of Encoding of Plain Text

2017-01-12 Thread Mark Davis ☕️
On Tue, Jan 10, 2017 at 8:40 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote: > On Tue, 10 Jan 2017 10:11:41 +0100 > Mark Davis ☕️ wrote: > > > What I really wish we had would be a machine readable set of regexes > > for each complex script (a

Re: Specification of Encoding of Plain Text

2017-01-10 Thread Mark Davis ☕️
What I really wish we had would be a machine readable set of regexes for each complex script (and for each language-script combination that is different than the default for that script). Such a regex R could be used for determining the well-formed ordering of code points within words. The regex n

Re: IdnaTest.txt and RFC 5893

2017-01-05 Thread Mark Davis ☕️
Alastair, thanks for finding it and bringing it up. I think you're right that the problem is in that the test generation code doesn't properly apply the bidi criteria to *all* the labels if *any* of the labels are RTL, but instead is probably just going on a label-by-label basis. Thankfully, it loo

Re: Leading ZWJ in Emoji sequences page

2017-01-03 Thread Mark Davis ☕️
Thanks for catching this! Mark On Tue, Jan 3, 2017 at 2:14 PM, Dominik Röttsches wrote: > Hi Mark, others, > > in http://unicode.org/emoji/charts/emoji-zwj-sequences.html as well as in > the beta 5.0 version of this page, some of the "Browser" fields have a > leading ZWJ. > > Compare copying th

Re: About standardized variants of characters in Dingbat block

2016-12-27 Thread Mark Davis ☕️
On Tue, Dec 27, 2016 at 7:15 PM, Christoph Päper < christoph.pae...@crissov.de> wrote: > ✊✋✌️🦎🖖? > ​I'd use:​ [image: ⛰] [image: 📃][image: ✂][image: 🦎] [image: 🖖] Mar

Re: About standardized variants of characters in Dingbat block

2016-12-27 Thread Mark Davis ☕️
> people are still missing the lizard. ;) http://unicode.org/emoji/charts-beta/emoji-list.html#1f98e Mark On Tue, Dec 27, 2016 at 2:51 PM, Christoph Päper < christoph.pae...@crissov.de> wrote: > Markus Scherer : > > > > The other two were not Dingbats but only came from the Japanese carrier > s

Re: Another UAX #29 bug: property tables need updating

2016-12-23 Thread Mark Davis ☕️
Also, under http://unicode.org/reports/tr29/#Conformance see the following. The wording could be stronger: the CLDR customizations are strongly recommended. - Some changes to rules and data are needed for best segmentation behavior of additional emoji zwj sequences [UTR51

Re: Emoji Packs

2016-12-21 Thread Mark Davis ☕️
Please consult the line on the charts: "For information about the images used in these charts, see Emoji Images and Rights ." Mark On Wed, Dec 21, 2016 at 6:07 PM, Rebecca T <637...@gmail.com> wrote: > Okay, I threw something together. > > github.com/yea

Re: The (Klingon) Empire Strikes Back

2016-11-15 Thread Mark Davis ☕️
p (as I > recall discussions in the early years) was not so much the question whether > IP issues existed/could be resolved, but the fear that adding such an > "invented" and "frivolous" script would undermine the acceptance of > Unicode. Given the way Unicode is inve

Re: The (Klingon) Empire Strikes Back

2016-11-10 Thread Mark Davis ☕️
The committee doesn't "tentatively approve, pending X". But the good news is that I think it was the sense of the committee that the evidence of use for Klingon is now sufficient, and the rest of the proposal was in good shape (other than the lack of a date), so really only the IP stands in the wa

Re: Bit arithmetic on Unicode characters?

2016-10-09 Thread Mark Davis ☕️
Essentially all of the game pieces that are in Unicode were added for compatibility with existing character sets. ​I'm guessing that ​there are hundreds to thousands of possible other symbols associated with games in one way or another, or that could be dug out of instruction manuals (eg, http://ww

Re: font-encoded hacks

2016-10-06 Thread Mark Davis ☕️
We do provide data for keyboard mappings in CLDR ( http://unicode.org/cldr/charts/latest/keyboards/index.html). There are some further pieces we need to put into place. 1. Provide a bulk uploader that applies our sanity-checking tests for a proposed keyboard mapping, and provides real-time f

Re: [Unicode] Minimum set of Emoji characters

2016-10-02 Thread Mark Davis ☕️
​At this point, the original set of Japanese emoji has long since been surpassed. The recommendation is to support the set of emoji in the data files referenced by http://www.unicode.org/reports/tr51/. There's much more information about various choices there. Note that there is a proposed new ver

Re: [UTR#51-8] 1.4.3 Emoji Variation Sequences: Female/Venus and Male/Mars Signs

2016-09-02 Thread Mark Davis ☕️
Mark On Thu, Aug 25, 2016 at 4:52 PM, Christoph Päper < christoph.pae...@crissov.de> wrote: > TL;DR: Unicode properties should reflect user expectations, not vendor > choices. > > Mark Davis ☕️ : > > On Mon, Aug 22, 2016 at 11:26 PM, Christoph Päper < > christoph.pa

Re: [UTR#51-8] 1.4.3 Emoji Variation Sequences: Female/Venus and Male/Mars Signs

2016-09-02 Thread Mark Davis ☕️
that nothing said on this list will be taken up by the UTC unless someone submits a proposal to the UTC.) Mark On Fri, Sep 2, 2016 at 9:03 PM, Christoph Päper wrote: > Mark Davis ☕️ : > > > > In order to understand the status of any document in the registry, you > need to also loo

Re: [UTR#51-8] 1.4.3 Emoji Variation Sequences: Female/Venus and Male/Mars Signs

2016-09-02 Thread Mark Davis ☕️
In order to understand the status of any document in the registry, you need to also look at the minutes of the meeting where they are discussed, in this case: http://www.unicode.org/L2/L2016/16121.htm What you see there is: B.14.3 Provisional value for Emoji property [Emoji SC/Edberg, L2/16-087 <

Re: I'm excited about the proposal to add a brontosaurus emoji codepoint

2016-08-29 Thread Mark Davis ☕️
I mean the comic is the newest. There have been dinosaur proposals; the emoji subcommittee is still looking at the priorities among animals. Mark On Mon, Aug 29, 2016 at 9:18 PM, Mark Davis ☕️ wrote: > > On Mon, Aug 29, 2016 at 9:08 PM, Karl Williamson > wrote: > >> ht

Re: I'm excited about the proposal to add a brontosaurus emoji codepoint

2016-08-29 Thread Mark Davis ☕️
On Mon, Aug 29, 2016 at 9:08 PM, Karl Williamson wrote: > http://xkcd.com/1726/ ​That's the newest one ​ Mark

Re: [UTR#51-8] 1.4.3 Emoji Variation Sequences: Female/Venus and Male/Mars Signs

2016-08-24 Thread Mark Davis ☕️
On Mon, Aug 22, 2016 at 11:26 PM, Christoph Päper < christoph.pae...@crissov.de> wrote: > 1. it’s incomplete without an explicit neutral/ambiguous alternative and > ​As I said, people are actively investigating what to do about such cases. It may be that the solution is to add ⚲ U+26B2 Neuter, bu

Re: ZWJ sequences in UTR #51 v4

2016-08-21 Thread Mark Davis ☕️
There have been discussions of how an "unmarked" (neutral, ungendered) form could be represented. Here are just some thoughts. There are currently three types of gender representation. 1. Intrinsic (eg, FATHER CHRISTMAS) 2. +ZWJ+ (eg, male vs female health worker) 3. +ZWJ+ (eg, woman

Re: [UTR#51-8] 1.4.3 Emoji Variation Sequences: Female/Venus and Male/Mars Signs

2016-08-21 Thread Mark Davis ☕️
Similarity or containment in the same block as current emoji characters is not sufficient grounds for changing characters to have the Emoji property (and thus being eligible for the text/emoji VS). If there is a particular set of existing characters that you would like to propose to become Emoji (

Re: "Emojis" in Reading Texts for Beginners

2016-08-21 Thread Mark Davis ☕️
The selection criteria for emoji are unlike those of other characters, because their primary usage is different. If there is a particular set of emoji characters that you would like to propose, see information at http://unicode.org/emoji/selection.html for how to do so, and what the selection facto

Re: Where are the tools to generate posix and json from cldr?

2016-08-11 Thread Mark Davis ☕️
​That is a bit obscure! We stopped generating the source for POSIX because essentially every user customized it in some way, so was better to do with a tool. We need to add a pointer to where to get the tools and how to use them. http://cldr.unicode.org/index/downloads#Repository_Organization show

Re: Whitespace characters in Unicode

2016-08-04 Thread Mark Davis ☕️
There are 25 Whitespace characters. Here they are grouped by LineBreak property: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%3Awhitespace%3A&g=Lb&i= Don't have time to respond more now. Mark On Thu, Aug 4, 2016 at 12:37 PM, Sean Leonard wrote: > Hi Unicode Folks: > > I am trying to

Re: New olympic sport emoji

2016-08-03 Thread Mark Davis ☕️
On Wed, Aug 3, 2016 at 3:57 PM, gfb hjjhjh wrote: > https://twitter.com/Tokyo2020/status/760930003760492544 ​No proposal has been received for these 5 items. FYI: any proposal for emoji for inclusion in 2017 needs to be received by Oct 1, and follow the guidelines in http://www.unicode.org/emo

Re: Emoji and Annotation data

2016-06-24 Thread Mark Davis ☕️
You should never be scraping *any* Unicode HTML files. They are not made for that, and there is no guarantee of stability. The emoji files are built from data which is described in http://www.unicode.org/reports/tr51/ (plus CLDR annotations and collation) Mark On Fri, Jun 24, 2016 at 7:21 AM, Ta

Re: UAX 29 9.0.0 new emoji flag rules questions and comments

2016-06-22 Thread Mark Davis ☕️
That wouldn't work. The process works by taking each offset, and walking through all the rules, using the first one that matches. So with your rules and the following input: RI RI RI RI RI RI You'd get that any offset with at least 2 RI on the right and on the left would have no break, and every

Re: Latin Letters Capital and Small Theta

2016-06-13 Thread Mark Davis ☕️
> such as URLs (domain names) there are restrictions that prevent script-mixing in a single label. That is just a current implementation restriction, based on only using the Script property. Implementations upgraded to use Script_Extensions to test for multiple scripts in a string can handle multi

Re: Adopting ZWJ

2016-06-08 Thread Mark Davis ☕️
We wanted to be a bit conservative regarding those characters, partly because we are using a payment service that is fussy. We could test it out again — but our first priority is getting U9.0 out the door! Mark On Tue, Jun 7, 2016 at 10:52 PM, Karl Williamson wrote: > On 06/07/2016 02:48 PM, Ka

Re: Canonical block names: spaces vs. underscores

2016-05-26 Thread Mark Davis ☕️
The canonical property and property value formats are in the *Alias* files. {phone} On May 26, 2016 06:57, "Mathias Bynens" wrote: > > > On 26 May 2016, at 10:17, Mathias Bynens wrote: > > > > `Blocks.txt` (http://unicode.org/Public/UNIDATA/Blocks.txt) lists > blocks such as `Cyrillic Supplemen

Re: non-breaking snakes

2016-05-04 Thread Mark Davis ☕️
Arabic has tatweel/kashida for justification; rather similar in principle. https://en.wikipedia.org/wiki/Kashida Mark On Wed, May 4, 2016 at 9:14 AM, Shriramana Sharma wrote: > Isn't there some Japanese orthography feature that already does > something like this? > > -- > Shriramana Sharma ஶ்ர

Re: non-breaking snakes

2016-05-04 Thread Mark Davis ☕️
Very nice! Mark On Wed, May 4, 2016 at 8:54 AM, Julian Bradfield wrote: > See > http://xkcd.com/1676/ > (making sure to look at the mouse-over text) > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >

Re: UTC makes the Colbert show

2016-03-30 Thread Mark Davis ☕️
On Wed, Mar 30, 2016 at 7:42 PM, Jennifer 8. Lee wrote: > I thought his "elf exposing self in park" was an amazing (and accurate) > facial expression. > ​Right! How does he make his cheeks do that!?!​ Mark

UTC makes the Colbert show

2016-03-30 Thread Mark Davis ☕️
Fredrik passed this on: https://www.youtube.com/watch?v=CfZE56E0Uts ; skip ahead to 1:30. Mark

Re: NamesList.txt as data source

2016-03-28 Thread Mark Davis ☕️
> I'm very curious about where CLDR data depends on these subheaders or other annotations in NamesList.txt You're right. CLDR data doesn't. I think there is a misunderstanding because of the online utilities which have been, for convenience, hosted with the same server as the CLDR survey tool. So

Re: Swapcase for Titlecase characters

2016-03-19 Thread Mark Davis ☕️
The 'swapcase' just sounds bizarre. What on earth is it for? My inclination would be to just do the simplest possible implementation that has the expected results for the 1:1 case pairs, and whatever falls out from the algorithm for the others. Mark On Sat, Mar 19, 2016 at 4:11 AM, Asmus Freytag

Re: Character folding in text editors

2016-02-21 Thread Mark Davis ☕️
On Sat, Feb 20, 2016 at 11:10 PM, Asmus Freytag (t) wrote: > Unicode, even CLDR, doesn't nearly have enough data for the purpose. > (and as a corollary of what Elias points out, it's likely to annoy users > of every language, in that it would fold essential and non-essential > distinctions indisc

Re: Character folding in text editors

2016-02-20 Thread Mark Davis ☕️
Yes, that can be used. Easiest is using ICU. Create a collator, using the "search" keyword. That can be used to search for text, using settings you want for the strength (primary differences, secondary, etc). You can also access the collation keys from the ICU API, and build a mapping yourself of

Re: Enclosing BANKNOTE emoji?

2016-02-09 Thread Mark Davis ☕️
t the same frequency as various >> individual clock faces). >> >> It is quite evident that the dollar banknote emoji serves as a stand-in >> for at least half a dozen of various currencies. >> >> On Mon, Feb 8, 2016 at 10:25 PM, Mark Davis ☕️ >> w

Re: Enclosing BANKNOTE emoji?

2016-02-08 Thread Mark Davis ☕️
I would suggest that you first gather statistics and present statistics on how often the current combinations are used compared to other emoji, eg by consulting sources such as: http://www.emojixpress.com/stats/ or http://emojitracker.com/ Mark On Mon, Feb 8, 2016 at 8:34 PM, Leo Broukhis wrote

Re: Additional ZWJ prefixes in ZWJ emoji sequences page

2016-01-13 Thread Mark Davis ☕️
You're right. It's between the closing > and the following 👩‍ character \u003e *\u200d* \U0001f469 We'll see why that spurious character is there in the HTML. Mark On Wed, Jan 13, 2016 at 12:25 PM, Dominik Röttsches wrote: > Hi, > > if I am not mistaken, there are a couple of additional, pro

Re: Redundancy in TR14

2016-01-11 Thread Mark Davis ☕️
Looks that way to me too. Can you submit this as feedback? {phone} On Jan 12, 2016 00:39, "Karl Williamson" wrote: > Example 7 in http://www.unicode.org/reports/tr14/#Examples > > has these two rules > > NU × (NU | SY | IS) > > NU (NU | SY | IS)* × (NU | SY | IS | CL | CP ) > > It appears to me

Re: UN/LOCODE perspective on character sets

2015-12-17 Thread Mark Davis ☕️
Haven't looked it over in detail, but here is the notice: http://www.unece.org/fileadmin/DAM/cefact/locode/2015-2_UNLOCODE_SecretariatNotes.pdf >From a quick scan: They've added latitude/longitude (to the minute, ~2km); that's great because often the names of locations are ambiguous. They still

Re: Line breaking status of emoji modifiers

2015-12-06 Thread Mark Davis ☕️
Yes. This was discussed at the last UTC, and for line break (and other segmentation, eg #29), there is an action to proposal appropriate rules for 9.0. There are three types of emoji sequences that need to be handled: - flag sequences - modifier sequences - zwj sequences In the meantime,

Re: Emoji Proposal: Face With One Eyebrow Raised

2015-11-05 Thread Mark Davis ☕️
n this list are purely personal opinions, and predominantly from people who are not actually involved in the encoding process. Mark On Thu, Nov 5, 2015 at 12:47 PM, Doug Ewell wrote: > Mark Davis wrote: > > > The unicode_at_unicode.org mailing list isn't the right place for > >

Re: Emoji Proposal: Face With One Eyebrow Raised

2015-11-05 Thread Mark Davis ☕️
The unicode@unicode.org mailing list isn't the right place for submitting proposals; see the top of http://www.unicode.org/emoji/selection.html#submission under "submit as per Document Submission Details ." As for the images, that's also discussed the

Re: Emoji data in UCD xml ?

2015-11-03 Thread Mark Davis ☕️
We have revised this completely; see the R2 version. Mark On Tue, Nov 3, 2015 at 1:35 PM, Markus Scherer wrote: > About http://www.unicode.org/L2/L2015/15299-ucd-emoji-props.pdf > which has > > Emoji_Presentation (EP) > ● Non_Emoji (NE) > ● Default_Text (DT) > ● Default_Emoji (DE) > ● NA > > >

Re: Emoji data in UCD xml ?

2015-10-29 Thread Mark Davis ☕️
As Ken said, there's been some preliminary discussion, but we wanted to get initial information out in connection with UTR #51 first, and take more time to consider what UCD properties would look like, and which are necessary. The basic information that people want to access for implementations ar

Re: crafting emoji

2015-10-23 Thread Mark Davis ☕️
We haven't seen a proposal for that. See http://www.unicode.org/emoji/selection.html for how to submit one. Mark On Fri, Oct 23, 2015 at 7:33 PM, Molly Black wrote: > Why is there no knitting needles, yarn, sewing needle with thread or > sewing machine in the emoji library? Has this been discu

Re: Why Work at Encoding Level?

2015-10-21 Thread Mark Davis ☕️
Mark On Wed, Oct 21, 2015 at 6:16 AM, Daniel Bünzli wrote: > Le mercredi, 21 octobre 2015 à 04:37, Mark Davis ☕️ a écrit : > > ​If you're not, the question is relevant. > > I'm not disputing the question, I'm disputing trying to give it a defined > answer. Ev

Re: Why Work at Encoding Level?

2015-10-20 Thread Mark Davis ☕️
. If you are lucky enough to be only ever using programming languages that prevent that from ever happening, then the question is moot for you. If you're not, the question is relevant. Mark On Tue, Oct 20, 2015 at 6:47 PM, Daniel Bünzli wrote: > Le mercredi, 21 octobre 2015 à 02:23,

Re: Why Work at Encoding Level?

2015-10-20 Thread Mark Davis ☕️
> there is never any excuse for software to create unpaired surrogates, or any other sort of invalid code unit sequences First off, it depends on when one is encountered. They are invalid in UTF16, but are permitted in a Unicode 16-bit string. But more fundamentally, there may not be "excuses" fo

Re: Counting Codepoints

2015-10-13 Thread Mark Davis ☕️
On Tue, Oct 13, 2015 at 8:36 AM, Richard Wordingham < richard.wording...@ntlworld.com> wrote: > Rather the question must be the unwieldy one of how > many scalar values and lone surrogates it contains in total. > ​That may be the question in theory; in practice no programming language is going to

Re: Counting Codepoints

2015-10-12 Thread Mark Davis ☕️
I agree with Ken on "Any discussion about properties for surrogate code points is a matter of designing graceful API fallback for instances which have to deal with ill-formed strings and do *something*.", and here's be my advice based on that. You want the code point count to reflect the same coun

Re: Rights to the Emoji

2015-10-11 Thread Mark Davis ☕️
The twitter images are open sourced, I believe. {phone} On Oct 12, 2015 02:56, "Shervin Afshar" wrote: > Those listed in the column titled "Native" come from the operating system > (in your case, Mac OS X) and/or browser you are viewing that page on. One > can assume that the right to those belo

Re: Unicode in passwords

2015-10-06 Thread Mark Davis ☕️
While I think that RFC is useful, it has been interesting just how many of the problems recounted on this list go far beyond it, often having to do with UI issues. It would be useful to have a paper somewhere that organizes all of the problems presented here, and maybe makes a stab at describing te

Re: Deleting Lone Surrogates

2015-10-04 Thread Mark Davis ☕️
When I use http://unicode.org/cldr/utility/breaks.jsp, it does show the sequence 𑒏�𑒺 as just two grapheme clusters. In #29 we are specifically not concerned about ill-formed text (or other degenerate cases). I suppose it would be possible to handle isolated surrogates in different way (eg always b

Re: NNBSP and Word Boundaries

2015-10-02 Thread Mark Davis ☕️
Like Andy, I'm hesitant about changing the gc of NNBSP, because of backwards compatibility concerns. I'm also starting to think that scoping the wb change to Mongolian may not be a bad thing. We might want to explore what it would look like, since it would preserve the maximum compatibility for cu

Re: Unicode in passwords

2015-10-01 Thread Mark Davis ☕️
ting device (although typing a password on such a device may not be exactly a great idea from a security standpoint!). Mark <https://google.com/+MarkDavis> *— Il meglio è l’inimico del bene —* On Thu, Oct 1, 2015 at 9:33 AM, Richard Wordingham < richard.wording...@ntlworld.com> wrote

Re: Unicode in passwords

2015-09-30 Thread Mark Davis ☕️
I've heard some concerns, mostly around the UI for people typing in passwords; that they get frustrated when they have to type their password on different devices: 1. A device may not have keyboard mappings with all the keys for their language. 2. The keyboard mappings across devices vary

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Mark Davis ☕️
I think the term "non-ASCII Unicode" is just fine, and we don't need anything beyond that. It is clearly those Unicode characters that aren't (2) in http://unicode.org/glossary/#ASCII. Mark *— Il meglio è l’inimico del bene —* On Tue, Sep 29, 2015 at 6:20 PM, Sea

Re: VS: [somewhat off topic] straw poll

2015-09-11 Thread Mark Davis ☕️
BTW, the only way I see anything from Overington is when a message is quoted by someone else, since I long ago filtered those out of my email inbox. Mark <https://google.com/+MarkDavis> *— Il meglio è l’inimico del bene —* On Fri, Sep 11, 2015 at 7:35 PM, Mark Davis ☕️ wrote: >

Re: VS: [somewhat off topic] straw poll

2015-09-11 Thread Mark Davis ☕️
I suggest that you create a proposal for the UTC so that it can go on record; I suspect it will get a favorable reception. Mark *— Il meglio è l’inimico del bene —* On Fri, Sep 11, 2015 at 7:25 PM, Doug Ewell wrote: > William_J_G Overington > wrote: > > > If U

Re: String Ranges in Unicode Sets

2015-09-08 Thread Mark Davis ☕️
On Tue, Sep 8, 2015 at 9:53 AM, Asmus Freytag (t) wrote: > it is implied the String Range formulation is a compact form. > > Can you prove that it doesn't create any set of strings that can't be > specified in other ways (other than full enumeration of the strings?). > I ​t is simply a compact s

Re: String Ranges in Unicode Sets

2015-09-08 Thread Mark Davis ☕️
Mark <https://google.com/+MarkDavis> *— Il meglio è l’inimico del bene —* On Mon, Sep 7, 2015 at 9:46 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote: > On Mon, 7 Sep 2015 16:54:16 +0200 > Mark Davis ☕️ wrote: > > > On Mon, Sep 7, 2015 at

Re: String Ranges in Unicode Sets

2015-09-07 Thread Mark Davis ☕️
Thanks for the feedback. >By my reading, adding string ranges will initially make regular expression engines that don't use ICU non-compliant with Level 1 of UTS#18 Unicode Regular Expressions, in particular RL1.3 'subtraction and I don't see where you are getting that. UTS 35 isn't referenced by

Re: Upcoming proposal for Bitcoin sign

2015-09-05 Thread Mark Davis ☕️
At one point, the proposal states: Another alternative is ฿ THAI CURRENCY SYMBOL BAHT. This has the advantage of already being in Unicode and somewhat resembling the Bitcoin sign. A major disadvantage is this symbol is already in use as a currency symbol for a different currency, so using it to re

Re: Wrong character code for HELM SYMBOL in TR 51 Unicode Emoji?

2015-08-30 Thread Mark Davis ☕️
Thanks, that's a mis-edit. The following text should have been removed: ". Symbols with a graphical form that people may treat as pictographs, ... are categorized as emoji" Mark *— Il meglio è l’inimico del bene —* On Sun, Aug 30, 2015 at 2:33 AM, Garth Wallace

Re: a suggestion new emoji .

2015-08-19 Thread Mark Davis ☕️
​I'd agree about reading and following http://unicode.org/reports/tr51/#Selection_Factors. As far as petitions go, we take them with a sizable grain of salt. See http://unicode.org/reports/tr51/#Selection_Factors_Requested. In the particular cases you cite, we had sufficient evidence about prospec

Bogus glyphs for halfwidth characters

2015-08-11 Thread Mark Davis ☕️
For halfwidth characters like U+FF70 HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK I'm getting bogus glyphs from Kaiti* and Songti* (and STKaiti, STSongti). ​ See screenshot below. ​Anyone else see that, or know what is happening? Unfortunately, these fonts are getting picked up first in the fa

Re: Standardised Encoding of Text

2015-08-09 Thread Mark Davis ☕️
Mark <https://google.com/+MarkDavis> *— Il meglio è l’inimico del bene —* On Sun, Aug 9, 2015 at 7:10 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote: > On Sun, 9 Aug 2015 17:10:01 +0200 > Mark Davis ☕️ wrote: > > > While it would be good to docu

Re: Standardised Encoding of Text

2015-08-09 Thread Mark Davis ☕️
While it would be good to document more scripts, and more language options per script, that is always subject to getting experts signed up to develop them. What I'd really like to see instead of documentation is a data-based approach. For example, perhaps the addition of real data to CLDR for a "

Re: Emoji characters for food allergens

2015-08-03 Thread Mark Davis ☕️
BTW, the UTC declined to accept the allergen emoji set proposal. While some of the food items may be acceptable and the emoji subcommittee could re-propose them, there are principled problems with trying to deal with allergens as a set of emoji. So that is off the table. Mark

Re: PRI #299

2015-07-06 Thread Mark Davis ☕️
On Mon, Jul 6, 2015 at 5:53 PM, Leo Broukhis wrote: >> Most platforms display unknown printable characters as white >> rectangles with hex digits in them. >> In Doug's message, I saw a rectangle with 01F in the upper row, and >> 3F3 in the lower row. > > This is a handy feature, at least for cha

Re: Adding RAINBOW FLAG to Unicode

2015-07-02 Thread Mark Davis ☕️
Again, that has no advantage over PUA characters. Carriers/vendors can *already* add whatever PUA characters they want to fonts and keyboards. But of course, the problem is interoperability; you send a flag to a friend for your favorite vacation spot, Florida, and the friend sees a flag for New Jer

Re: Adding RAINBOW FLAG to Unicode

2015-07-02 Thread Mark Davis ☕️
To add some information that people like Noah may not be aware of: This email list is an open, public list for arbitrary discussions about Unicode and software internationalization. It is *not* an email list for consortium business—the vast majority of the people on it are *not* members of the Uni

Re: Representing Additional Types of Flags

2015-07-02 Thread Mark Davis ☕️
I'll try to answer a few of these. Mark *— Il meglio è l’inimico del bene —* On Tue, Jun 30, 2015 at 11:57 PM, Doug Ewell wrote: > Re-posting my comments and questions on this PRI to the list. I've > already submitted them as formal feedback. > > . > > I suppor

Re: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags)

2015-07-02 Thread Mark Davis ☕️
Ok. I wasn't clear enough. Certainly boundaries are political and relevant, as is the fact that they change. What is not relevant is talking about particular country's motivations and actions. Moreover, you insist about writing a tome about this. In other words, TL;DR. Mark

Re: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags)

2015-07-01 Thread Mark Davis ☕️
*​Please take political discussions elsewhere; they do not belong on this list.* The point about the boundaries of regions changing over time, and flags being associated with a former set of boundaries could have been made in a few sentences. Not only would it have avoided politics, it would have

<    1   2   3   4   5   6   7   8   9   10   >