Re: Another take on the English Apostrophe in Unicode

2015-06-16 Thread Mark Davis ☕️
And, Marcel, while you are at it, this is getting tiresome. Please find some other place to vent about events you know very little about; the internet is full of them. Mark Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Tue, Jun 16, 2015 at 7:33 PM, Doug Ewell

Re: Another take on the English apostrophe in Unicode

2015-06-15 Thread Mark Davis ☕️
On Mon, Jun 15, 2015 at 9:17 AM, Marcel Schneider charupd...@orange.fr wrote: When we take the topic down again from linguistics to the core mission of Unicode, that is character encoding and text processing standardisation, ellipsis and Swedish abbreviation colon differ from the single

Re: Another take on the English apostrophe in Unicode

2015-06-13 Thread Mark Davis ☕️
On Sat, Jun 13, 2015 at 5:10 PM, Peter Constable peter...@microsoft.com wrote: When it comes to orthography, the notion of what comprise words of a language is generally pure convention. That’s because there isn’t any single *_linguistic_ *definition of word that gives the same answer when

Re: free download of ISO/IEC 10646 (was: Accessing the WG2 document register)

2015-06-11 Thread Mark Davis ☕️
​I think the whole thread got overheated, and Andrew was just responding to other heated ​comments. So it might be time to let this thread cool off a bit. The collaboration over the years between the Unicode Consortium and ISO has been, on the whole, a remarkable success. There have been

Re: http://✈.ws

2015-06-05 Thread Mark Davis ☕️
Whoops, sent too soon. A surprise: http://✈.ws Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Fri, Jun 5, 2015 at 4:47 PM, Mark Davis ☕️ m...@macchiato.com wrote:

http://✈.ws

2015-06-05 Thread Mark Davis ☕️

Re: The Oral History Of The Poop Emoji

2015-06-01 Thread Mark Davis ☕️
One of many on http://unicode.org/press/emoji.html Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Mon, Jun 1, 2015 at 8:23 PM, Karl Williamson pub...@khwilliamson.com wrote:

Re: FYI: The world’s languages, in 7 maps and charts

2015-05-27 Thread Mark Davis ☕️
Hmmm. How accurate can it be? They forgot Austria, and got Switzerland wrong by almost a power of 10. Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Wed, May 27, 2015 at 10:18 AM, Denis Jacquerye moy...@gmail.com wrote: The South China Morning Post published a

Re: FYI: The world's languages, in 7 maps and charts

2015-05-27 Thread Mark Davis ☕️
and gives a population of 727 000 Standard German L1 speakers in Switzerland (the difference is counted as Swiss German L1 speakers). On Wed, 27 May 2015 at 11:22 Mark Davis [image: ☕]️ m...@macchiato.com wrote: Hmmm. How accurate can it be? They forgot Austria, and got Switzerland wrong by almost

Re: Tag characters

2015-05-18 Thread Mark Davis ☕️
​A few notes. A more concrete proposal will be in a PRI to be issued soon, and people will have a chance to comment more then. (I'm not trying to discourage discussion, just pointing out that there will be something more concrete relatively soon to comment on—people are pretty busy getting 8.0

Re: Tag characters

2015-05-15 Thread Mark Davis ☕️
The consortium is in no position to enhance protocols *itself* for exchanging images. That's firmly in other groups' hands. We can try to noodge them a bit, but what *will* make a difference is when the *vendors* of sticker solutions put pressure on the different groups responsible for the

FYI: The world’s languages, in 7 maps and charts

2015-05-12 Thread Mark Davis ☕️
http://www.washingtonpost.com/blogs/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts/

Re: Script / font support in Windows 10

2015-05-08 Thread Mark Davis ☕️
Thanks! Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Fri, May 8, 2015 at 7:15 AM, Peter Constable peter...@microsoft.com wrote: I think this is the right public link: https://msdn.microsoft.com/en-us/goglobal/bb688099.aspx *From:* Peter Constable

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-07 Thread Mark Davis ☕️
​The simplest approach would be to use ICU in a little program that scans the file. For example, you could write a little Java program that would scan the file, and turn any any sequence of (\u)+ into a String, then test that string with: static final UnicodeSet OK = new

Combining character example

2015-04-16 Thread Mark Davis ☕️
I happened to run across a good example of productive use of combining marks, the Duden site (a great online dictionary for German). They use U+0323 ( ̣) COMBINING DOT BELOW to indicate the stress. Here is an example: ụnterbuttern http://www.duden.de/rechtschreibung/unterbuttern They aren't,

Re: Combining character example

2015-04-16 Thread Mark Davis ☕️
borrowed from French) also have either a single line under the whole digraph or (this happens rarely) a single dot in the middle of the digraph. --Jörg Knappen *Gesendet:* Donnerstag, 16. April 2015 um 10:01 Uhr *Von:* Mark Davis [image: ☕]️ m...@macchiato.com *An:* Unicode Public unicode

Re: Are you CONFUSED about WHAT CHARACTER(S) you type?!?!

2015-03-26 Thread Mark Davis ☕️
It only provides a stand-in glyph if you don't otherwise have a font for that character on your system. That stand-in just indicates the type of character (eg script). No single font with current technology can handle all of Unicode. The most complete open font set I know of is the Noto family:

Re: Android 5.1 ships with support for several minority scripts

2015-03-14 Thread Mark Davis ☕️
Congrats! {phone} On Mar 14, 2015 03:09, Roozbeh Pournader rooz...@unicode.org wrote: Android 5.1 http://officialandroid.blogspot.com/2015/03/android-51-unwrapping-new-lollipop.html, released earlier this week, has added support for 25 minority scripts. The wide coverage can be reproduced by

Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)

2015-02-10 Thread Mark Davis ☕️
We are being pretty conservative about what we add. There are approximately 1,200 emoji characters now (see tr51), and we're anticipating adding perhaps 50 per release. And we are encouraging a sticker approach for the longer term. On the other hand, I wouldn't be surprised if the 41 emoji

Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)

2015-02-10 Thread Mark Davis ☕️
In what character encoding standard, or extension, does ROBOT FACE appear? Unicode has never been limited to what is in other character encoding standard or extensions, official or de facto. Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Mon, Feb 9, 2015 at 9:16

Re: About cultural/languages communities flags

2015-02-09 Thread Mark Davis ☕️
On Tue, Feb 10, 2015 at 12:11 AM, Ken Whistler kenwhist...@att.net wrote: for the full context, and for the current 26x26 letter matrix which is the basis for the flag glyph implementations of regional indicator code pairs on smartphones. SC, SO, ST are already taken, but might I suggest

Re: UAX 29 questions

2015-01-30 Thread Mark Davis ☕️
I apology in advance that I'm running low on time, and didn't go through all the messages on this thread carefully. So I may not be fully appreciating people's positions. I'm just making some quick points about 2 items that caught my eye. 1. There are certainly times where two rules in sequence

Re: (R), (c) and ™

2014-12-18 Thread Mark Davis ☕️
On Thu, Dec 18, 2014 at 11:31 AM, Andrea Giammarchi andrea.giammar...@gmail.com wrote: standard variant sensitive ​It is not clear what you mean by standard variant sensitive​. Can you elaborate? Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —*

Re: (R), (c) and ™

2014-12-18 Thread Mark Davis ☕️
, Mark Davis [image: ☕]️ m...@macchiato.com wrote: On Thu, Dec 18, 2014 at 11:31 AM, Andrea Giammarchi andrea.giammar...@gmail.com wrote: standard variant sensitive ​It is not clear what you mean by standard variant sensitive​. Can you elaborate? Mark https://google.com/+MarkDavis

Re: emoji are clearly the current meme fad

2014-12-17 Thread Mark Davis ☕️
We just had a new blog posting; we've moved the media list out of tr51, and the list already had that item on it. See: http://www.unicode.org/press/emoji.html#media Separately, I keep a list of how the media refers to the Unicode consortium: my favorite is shadowy emoji overlords. Bonus points

Re: emoji are clearly the current meme fad

2014-12-17 Thread Mark Davis ☕️
On Wed, Dec 17, 2014 at 9:03 PM, Murray Sargent murr...@exchange.microsoft.com wrote: http://www.theguardian.com/commentisfree/2014/nov/28/the-problem-with-emojis ​Bingo, Murray wins the prize! [image: Inline image 1]​ ​Not to open until Christmas...

Re: The rapid ... erosion of definition ability

2014-11-17 Thread Mark Davis ☕️
On Mon Nov 17 2014 at 12:15:08 PM Andreas Stötzner a...@signographie.de wrote: Am 17.11.2014 um 11:46 schrieb Leonardo Boiko: Sign is too general in its generality it is just perfect. The sets of signs in question are most general, covering much more matters, objects and topics than the

Re: The rapid ... erosion of definition ability

2014-11-17 Thread Mark Davis ☕️
nothing is. [1] http://en.wiktionary.org/wiki/sign 2014-11-17 8:09 GMT-02:00 Andreas Stötzner a...@signographie.de: Am 17.11.2014 um 08:35 schrieb Mark Davis ☕️: IT’S EASY TO DISMISS EMOJI. They are, at first glance, ridiculous The only ridiculous thing is to name them “Emoji” outside

The rapid evolution of a wordless tongue

2014-11-16 Thread Mark Davis ☕️
http://nymag.com/daily/intelligencer/2014/11/emojis-rapid-evolution.html A more extended article from NY Magazine about the growing usage of emoji, and the ways in which that usage is developing. Has a quote from Peter Constable and (indirect) reference to +Steven R. Loomis. “IT’S EASY TO

Re: Emoji skin tone modifiers on the website of a leading German daily newspaper

2014-11-08 Thread Mark Davis ☕️
As far as I can tell it is garnering interest all over.. Several German publications, including Spiegel, to French and Italian regional papers, to Indonesian, Vietnamese http://www.spiegel.de/netzwelt/web/unicode-consortium-emojis-demnaechst-fuer-alle-hautfarben-a-1001125.html

Re: Open Source Emoji for the Web

2014-11-07 Thread Mark Davis ☕️
? Thanks On Fri, Nov 7, 2014 at 12:18 AM, Mark Davis ☕️ m...@macchiato.com wrote: Very nice. I'd have one suggestion. People appear to be converging on similar file names for the emoji. - Lowercase hex numbers, - at least 4 digits, - otherwise no leading zeros, - multiple code

keynote

2014-11-06 Thread Mark Davis ☕️
As an experiment, we recorded the keynote at the Unicode Conference. I posted them at http://macchiati.blogspot.com/2014/11/unicode-emoji.html Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* ___ Unicode mailing list

Re: Open Source Emoji for the Web

2014-11-06 Thread Mark Davis ☕️
Very nice. I'd have one suggestion. People appear to be converging on similar file names for the emoji. - Lowercase hex numbers, - at least 4 digits, - otherwise no leading zeros, - multiple code points separated by _, - with optional prefix/suffix. Like dcm_0030_20e3.png. I'd

Re: Question about a Normalization test

2014-10-23 Thread Mark Davis ☕️
On Thu, Oct 23, 2014 at 6:54 PM, Aaron Cannon cann...@fireantproductions.com wrote: 0061 05AE 0305 0300 0315 0062 http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cu0061+%5Cu05AE+%5Cu0305+%5Cu0300+%5Cu0315+%5Cu0062g=ccc ​0305 and 0300 have the same ccc, so the first one blocks the

fonts for U7.0 scripts

2014-10-22 Thread Mark Davis ☕️
I'm looking for freely downloadable TTF fonts for any of the following. I'd appreciate links to sites for any of these: 1. Bassa_Vah 2. Duployan 3. Grantha 4. Khojki 5. Khudawadi 6. Mahajani 7. Mende_Kikakui 8. Modi 9. Mro 10. Nabataean 11. Old_Permic 12.

Re: What happened to...?

2014-09-20 Thread Mark Davis ☕️
I agree that we should minute at least some reason for declining. It need only be a sentence or two. (BTW I wasn't at that discussion.) {phone} On Sep 20, 2014 3:17 AM, Asmus Freytag asm...@ix.netcom.com wrote: On 9/19/2014 5:38 PM, Whistler, Ken wrote: Michael, Declines to take action”

Re: FYI: Ruble sign in Windows

2014-08-14 Thread Mark Davis ☕️
Cool, congratulations! Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Thu, Aug 14, 2014 at 3:52 PM, Peter Constable peter...@microsoft.com wrote: For those interested, there is an update for Windows available now to add font, keyboard and locale data support

Re: meaningful and meaningless FE0E

2014-06-29 Thread Mark Davis ☕️
These variation selector characters only apply to specific characters, those listed in http://unicode.org/Public/UNIDATA/StandardizedVariants.html There is a machine-readable version at http://unicode.org/Public/UNIDATA/StandardizedVariants.txt Mark https://google.com/+MarkDavis *— Il meglio

Re: Swift

2014-06-05 Thread Mark Davis ☕️
I haven't done any analysis, but on first glance it looks like it is based on http://www.unicode.org/reports/tr31/#Alternative_Identifier_Syntax Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Thu, Jun 5, 2014 at 5:46 PM, Jeff Senn s...@maya.com wrote: Has

Re: Swift

2014-06-04 Thread Mark Davis ☕️
Apparently you can use emoji in the identifiers.  ( http://www.globalnerdy.com/2014/06/03/swift-fun-fact-1-you-can-use-emoji-characters-in-variable-constant-function-and-class-names/ ) Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Wed, Jun 4, 2014 at 11:28 AM,

Re: Corrigendum #9

2014-06-03 Thread Mark Davis ☕️
On Mon, Jun 2, 2014 at 10:32 PM, David Starner prosfil...@gmail.com wrote: Why? It seems you're changing the rules ​... This isn't are changing, it is has changed. The Corrigendum was issued at the start of 2013, about 16 months ago; applicable to all relevant earlier versions. It was the

Re: Corrigendum #9

2014-06-03 Thread Mark Davis ☕️
On Tue, Jun 3, 2014 at 9:41 AM, David Starner prosfil...@gmail.com wrote: Thinking that a utility would never mangle them if encountered in input text was a pipe-dream. I didn't say not mangle, I said break, as in crash. ​I don't think this thread is going anywhere productive, so​ I'm

Re: Unicode Regular Expressions, Surrogate Points and UTF-8

2014-06-02 Thread Mark Davis ☕️
\uD808\uDF45 specifies a sequence of two codepoints. ​That is simply incorrect.​ In Java (and similar environments), \u means a char (a UTF16 code unit), not a code point. Here is the difference. If you are not used to Java, string.replaceAll(x,y) uses Java's regex to replace the pattern x

Re: Corrigendum #9

2014-06-02 Thread Mark Davis ☕️
The problem is where to draw the line. In today's world, what's an app? You may have a cooperating system of apps, where it is perfectly reasonable to interchange sentinel values (for example). I agree with Markus; I think the FAQ is pretty clear. (And if not, that's where we should make it

Re: Corrigendum #9

2014-06-02 Thread Mark Davis ☕️
On Mon, Jun 2, 2014 at 6:21 PM, Shawn Steele shawn.ste...@microsoft.com wrote: The “problem” is now that previously these characters were illegal The problem was that we were inconsistent in standard and related material about just what the status was for these things. Mark

Re: Corrigendum #9

2014-06-02 Thread Mark Davis ☕️
. Any app where input of noncharacters causes security problems or crashes is and was not a very good app. Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Mon, Jun 2, 2014 at 6:37 PM, Asmus Freytag asm...@ix.netcom.com wrote: On 6/2/2014 9:27 AM, Mark Davis ☕️ wrote

Re: Unicode Regular Expressions, Surrogate Points and UTF-8

2014-05-31 Thread Mark Davis ☕️
I think you have a point here. We should probably change to: To meet this requirement, an implementation shall supply a mechanism for specifying any Unicode scalar value (from U+ to U+D7FF and U+E000 to U+10), using the hexadecimal code point representation. and then in the notes say

Re: Long-Encoded Restricted Characters in High Frequency Modern Use

2014-05-31 Thread Mark Davis ☕️
Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Fri, May 30, 2014 at 12:39 AM, Richard Wordingham richard.wording...@ntlworld.com wrote: I am a little confused by the call for a review of UTS #39, Unicode Security Mechanisms (PRI #273). Are we being requested to

Re: Corrigendum #9

2014-05-31 Thread Mark Davis ☕️
A few quick items. (I admit to only skimming your response, Phillipe; there is only so much time in the day.) Any discussion of changing non-characters is really pointless. See http://www.unicode.org/policies/property_value_stability_table.html As to breaking up the block, that is not forbidden:

Re: Unicode Sets in 'Unicode Regular Expressions'

2014-05-27 Thread Mark Davis ☕️
They are defined in http://unicode.org/reports/tr35/tr35.html#Unicode_Sets. We should add a pointer to that; could you please file a feedback report for #18 to that effect? Also, if you find any problems in the description in #35, you can file a ticket at http://unicode.org/cldr/trac/newticket to

Re: ID_Start, ID_Continue, and stability extensions

2014-04-28 Thread Mark Davis ☕️
On 25 April 2014 20:53, Karl Williamson pub...@khwilliamson.com wrote: And in fact in some Unicode releases, they contained errors. I think you know this, but for others. A derived property value in the UCD is defined by the value in the derived data file, NOT by the derivation.​ Of course,

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-22 Thread Mark Davis ☕️
We try not to do that. There are some known holes, like RBNF. if you know of others please file a ticket. {phone} On Apr 21, 2014 9:18 PM, Doug Ewell d...@ewellic.org wrote: From: Asmus Freytag asmusf at ix dot netcom dot com wrote: In general, I heartily dislike specifications that just

Re: Updated emoji working draft

2014-04-15 Thread Mark Davis ☕️
On 15 April 2014 13:14, William_J_G Overington wjgo_10...@btinternet.comwrote: If the UTC (Unicode Technical Committee) accepts the introduction of read-out labels, each read-out label both linked to a pictograph character and also linked to a language-localization text string, then that will

Re: Updated emoji working draft

2014-04-14 Thread Mark Davis ☕️
This is really off topic. If you want to start up a thread about this, please use a different subject. Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On 14 April 2014 16:01, William_J_G Overington wjgo_10...@btinternet.comwrote: Here are two examples each of a

Re: Updated emoji working draft

2014-04-12 Thread Mark Davis ☕️
On 12 April 2014 11:46, William_J_G Overington wjgo_10...@btinternet.comwrote: ​...​ In March 2014 I published the attached document, depositing a copy with the British Library. The_format_of_the_translit.dat_file_suggested_for_possible_use_for_transliteration.pdf Is this format suitable to

Re: Updated emoji working draft

2014-04-12 Thread Mark Davis ☕️
On 12 April 2014 16:54, William_J_G Overington wjgo_10...@btinternet.comwrote: Would it be good, for an emoji that is not encoded in regular Unicode, to include mention of the possibility of transmission by markup bubble, rendered upon reception as an unmapped glyph by an OpenType colour font?

Re: Bidi reordering of soft hyphen

2014-04-02 Thread Mark Davis ☕️
I tend to agree with Roozbeh and Behdad. I would expect to find the visible appearance of the hyphen replacing the letters that were broken off from the last word. That is, if the word was beekeeper, I'd expect to see: bee- . That would be no matter where the word occurred, and no

FYI: More emoji from Chrome

2014-04-01 Thread Mark Davis ☕️
More emoji from Chrome: http://chrome.blogspot.ch/2014/04/a-faster-mobiler-web-with-emoji.html with video: https://www.youtube.com/watch?v=G3NXNnoGr3Y ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode

Re: FYI: More emoji from Chrome

2014-04-01 Thread Mark Davis ☕️
Yup! Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On 1 April 2014 09:13, Philippe Verdy verd...@wanadoo.fr wrote: April 1st joke... 2014-04-01 9:01 GMT+02:00 Mark Davis ☕️ m...@macchiato.com: More emoji from Chrome: http://chrome.blogspot.ch/2014/04

Re: Names for control characters (Was: (in 6429) in allkeys.txt)

2014-03-12 Thread Mark Davis
They do have aliases in NameAliases.txt ;NULL;control ;NUL;abbreviation 0001;START OF HEADING;control 0001;SOH;abbreviation 0002;START OF TEXT;control 0002;STX;abbreviation ... Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Wed, Mar 12, 2014 at 1:32

Re: NFD - NFC

2014-03-11 Thread Mark Davis
Not sure about your exact case, but ICU's normalization does handle those characters. http://unicode.org/cldr/utility/transform.jsp?a=nfc%3Bhexb=%5Cu30B9%5Cu3099 (That tool uses ICU for NFC). Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Tue, Mar 11, 2014 at

Re: Unicode organization is still anti-Serbian and anti-Macedonian

2014-02-14 Thread Mark Davis
Unicode is not anti-Serbian or Macedonian. The exact level of Unicode support will depend on your operating system and font choice. For example, on the Mac there are reasonable results with arbitrary accents. Here are examples with q,U+0308 and Q,U+0308 q̈ Q̈ Here is an image, in case your

Re: CJK IDS database

2014-01-14 Thread Mark Davis
Boy, I'd forgotten about those. There is an open-source collection of IDSs that I used to create those files. Unfortunately, I found that *that* data would take a lot of cleanup. I do agree that it would be very useful to have an open-source repository of IDSs for Unicode characters, but I don't

Language Death

2013-12-05 Thread Mark Davis
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0077056 with a popular article at http://www.washingtonpost.com/blogs/worldviews/wp/2013/12/04/how-the-internet-is-killing-the-worlds-languages/ The source article was interesting, although I'd take issue with some of their

Re: Best practice of using regex on identify none-ASCII email address

2013-11-01 Thread Mark Davis
These are two well-known serious flaws in EAI and URLs; there is no useful syntactic limit on what is in the query part of a URL or on the local part of an email address that would allow their boundaries to be detected in plaintext. No use complaining about them, because people are concerned with

Re: Best practice of using regex on identify none-ASCII email address

2013-11-01 Thread Mark Davis
Mark Davis ☕ m...@macchiato.com These are two well-known serious flaws in EAI and URLs; there is no useful syntactic limit on what is in the query part of a URL or on the local part of an email address that would allow their boundaries to be detected in plaintext. No use complaining about them

Re: full-width Latin missing from confusables data

2013-10-29 Thread Mark Davis
è l’inimico del bene —* ** On Tue, Oct 15, 2013 at 8:53 PM, Mark Davis ☕ m...@macchiato.com wrote: but as Michel mentioned the data does not seem consistent in that case. ​ You might add that to your report​... Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è

Re: Terminology question re ASCII

2013-10-28 Thread Mark Davis
Normally the term ASCII just refers to the 7-bit form. What is sometimes called 8-bit ASCII is the same as ISO Latin 1. If you want to be completely clear, you can say 7-bit ASCII. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Tue, Oct 29,

Re: full-width Latin missing from confusables data

2013-10-15 Thread Mark Davis
/2013 12:40 AM, Mark Davis ☕ wrote: For the confusables, the presumption is that implementations have already either normalized the input to NFKC or have rejected input that is not NFKC. Thanks for the explanation Mark. It makes sense for implementations which want to detect confusability

Re: full-width Latin missing from confusables data

2013-10-14 Thread Mark Davis
For the confusables, the presumption is that implementations have already either normalized the input to NFKC or have rejected input that is not NFKC. More broadly, in gathering data the main emphasis is on characters that fit the profile in

Re: More additional Greek (and Hebrew) characters needed for proposal

2013-09-21 Thread Mark Davis
http://www.unicode.org/faq/char_combmark.html#9 and following. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Sat, Sep 21, 2013 at 7:38 PM, Robert Wheelock rwhlk...@gmail.com wrote: Hello again, y’all! I’ve got quite a few characters

Re: Code point vs. scalar value

2013-09-20 Thread Mark Davis
Nicely stated. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Thu, Sep 19, 2013 at 11:21 PM, Whistler, Ken ken.whist...@sap.comwrote: Stephan Stiller seems unconvinced by the various attempts to explain the situation. Perhaps an

Re: Draft of LDML Specification for CLDR release 24

2013-09-13 Thread Mark Davis
Thanks for the feedback; the typo is fixed. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Fri, Sep 13, 2013 at 1:19 AM, Philippe Verdy verd...@wanadoo.fr wrote: Typo in section 2.3 Number Symbols, for the new item superscriptingExponent

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-05 Thread Mark Davis
Classical Greek might qualify [for a CLDR entry] It certainly qualifies, but we require that a submitter commit to collecting a minimal amount of data before we add it. See http://cldr.unicode.org/index/cldr-spec/minimaldata Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è

Re: Behdad Esfahbod won an O'Reilly Open Source Award!

2013-07-29 Thread Mark Davis
Great news, and well deserved! Congratulations, Behdad! Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Mon, Jul 29, 2013 at 9:41 PM, Roozbeh Pournader rooz...@google.comwrote: Some of you probably have heard the news already, but in case

Re: What does one do if the encoding is unknown and all you have is a sequence of bytes?

2013-07-19 Thread Mark Davis
Popping up a level. ICU (and some other libraries) have heuristic encoding detection, that will take a sequence of bytes and come up with a likely encoding id. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Fri, Jul 19, 2013 at 8:40 PM,

Re: The skywriter we hired has terrible Unicode support

2013-05-08 Thread Mark Davis
Saw that, thanks! Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Wed, May 8, 2013 at 8:26 PM, Tim Greenwood timo...@greenwood.namewrote: http://xkcd.com/1209/

RE: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-20 Thread Mark Davis
LOL... {phone} On Apr 20, 2013 8:44 PM, Erkki I Kolehmainen e...@iki.fi wrote: Mr. Overington, I'm sorry to have to admit that I cannot follow at all your train of thought on what would be the practical value of localizable sentences in any of the forms that you are contemplating. In my

Re: Rendering Raised FULL STOP between Digits

2013-03-10 Thread Mark Davis
Should the Unicode Consortium decide to recommend an existing (or new) character as a raised decimal for numbers, we would add that to CLDR, and recommend that implementations accept either one as equivalent when parsing. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è

Re: JSON version of CLDR

2013-03-03 Thread Mark Davis
I think just the main data is converted. If you want to request the other data you can file a cldr ticket. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Sat, Mar 2, 2013 at 8:35 PM, Edwin Hoogerbeets ehoogerbe...@gmail.comwrote: Hi all, I

Re: What does it mean to not be a valid string in Unicode?

2013-01-07 Thread Mark Davis
But still non-conformant. That's incorrect. The point I was making above is that in order to say that something is non-conformant, you have to be very clear what it is non-conformant *TO* . Also, we commonly read code points from 16-bit Unicode strings, and unpaired surrogates are returned

Re: Are there Unicode processors?

2013-01-07 Thread Mark Davis
That is not the typical way that Unicode text is processed. Typically whatever OS you are using will supply mechanisms for iterating through any Unicode string, returning each of the code points. It may also offer APIs for returning information about each character (called 'property values', or

Re: What does it mean to not be a valid string in Unicode?

2013-01-07 Thread Mark Davis
That's not the point (see successive messages). Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Mon, Jan 7, 2013 at 4:59 PM, Martin J. Dürst due...@it.aoyama.ac.jpwrote: On 2013/01/08 3:27, Markus Scherer wrote: Also, we commonly read code

Re: What does it mean to not be a valid string in Unicode?

2013-01-07 Thread Mark Davis
In practice and by design, treating isolated surrogates the same as reserved code points in processing, and then cleaning up on conversion to UTFs works just fine. It is a tradeoff that is up to the implementation. It has nothing to do with a legacy of C pointer arithmetic. It does represent a

Re: What does it mean to not be a valid string in Unicode?

2013-01-06 Thread Mark Davis
Some of this is simply historical: had Unicode been designed from the start with 8 and 16 bit forms in mind, some of this could be avoided. But that is water long under the bridge. Here is a simple example of why we have both UTFs and Unicode Strings. Java uses Unicode 16-bit Strings. The

Re: If X sorts before Y, then XZ sorts before YZ ... example of where that's not true?

2013-01-06 Thread Mark Davis
There are many cases of such digraphs. Example from Slovak: c d h but cd h ch Cf http://www.unicode.org/reports/tr10/, searching for Slovak. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Sun, Jan 6, 2013 at 1:56 PM, Costello, Roger L.

Re: holes (unassigned code points) in the code charts

2013-01-04 Thread Mark Davis
http://www.unicode.org/alloc/CurrentAllocaiton.html = http://www.unicode.org/alloc/CurrentAllocation.html Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Fri, Jan 4, 2013 at 10:24 AM, Whistler, Ken ken.whist...@sap.com wrote: Stephan Stiller

Re: What does it mean to not be a valid string in Unicode?

2013-01-04 Thread Mark Davis
To assess whether a string is invalid, it all depends on what the string is supposed to be. 1. As Ken says, if a string is supposed to be in a given encoding form (UTF), but it consists of an ill-formed sequence of code units for that encoding form, it would be invalid. So an isolated surrogate

Re: locale-aware string comparisons

2013-01-02 Thread Mark Davis
. -Shawn -Original Message- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of James Cloos Sent: Tuesday, January 1, 2013 5:43 PM To: Mark Davis ☕ Cc: Whistler, Ken; unicode@unicode.org Subject: Re: locale-aware string comparisons MD == Mark Davis ☕ m

Re: locale-aware string comparisons

2013-01-01 Thread Mark Davis
3. Regarding LDML and CLDR, somebody with specific expertise on CLDR James, Even without locale differences, the situation is a bit tricky. Assuming that str_tolower() and str_toupper() were straightforwardly defined in terms of the (full) Unicode case mappings, there is still the issue that the

Re: Character name translations

2012-12-20 Thread Mark Davis
There are different use cases, and I think they are getting confused. 1. Present a name for each character, some sort of formal name. I think this is probably the least useful for average users. 2. Allow searching for characters, eg in a character picker. Sample use case: search for dash (or the

Some much-needed improvements in JavaScript i18n

2012-12-19 Thread Mark Davis
I have a new google blog post about the new ECMAScript (JavaScript) internationalization spec. “Until now, it has been very difficult for web application designers to do something as simple as sort names correctly according to the user's language. And it matters: English readers wouldn’t expect

Re: Question about normalization tests

2012-12-10 Thread Mark Davis
0300 *is* blocked, because there is a preceding character (0305) that has the same combining class (230). Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Mon, Dec 10, 2012 at 11:55 AM, Edwin Hoogerbeets ehoogerbe...@gmail.comwrote: Looking at

Re: io9 describes Unicode as one of the 10 most unlikely things influenced by J.R.R. Tolkien

2012-12-08 Thread Mark Davis
Their inference, it appears, is that had I not read Tolkien when I was 13 I would not be who I am today and the content of the Universal Character Set might be a lot different than it is. I doubt it. Many people are far more responsible for the structure, model, properties, and characters of

Re: StandardizedVariants.txt error?

2012-11-26 Thread Mark Davis
I agree with that analysis. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Mon, Nov 26, 2012 at 1:53 PM, Whistler, Ken ken.whist...@sap.com wrote: Actually, I think the omission here is the word canonical. In other words, Section 16.4

Re: Caret

2012-11-12 Thread Mark Davis
This case remains very infrequent: it is extremely rare to start typing text in With arrow keys or mouse clicking it is more frequent to end up on a directional boundary. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Mon, Nov 12, 2012 at

Re: Character set cluelessness

2012-10-02 Thread Mark Davis
I tend to agree. What would be useful is to have one column for the city in the local language (or more columns for multilingual cities), but it is extremely useful to have an ASCII version as well. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* **

Re: Character set cluelessness

2012-10-02 Thread Mark Davis
Eg, in http://www.unece.org/fileadmin/DAM/cefact/locode/gr.htm Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Tue, Oct 2, 2012 at 1:49 PM, Mark Davis ☕ m...@macchiato.com wrote: I tend to agree. What would be useful is to have one column

Re: Character set cluelessness

2012-10-02 Thread Mark Davis
://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Tue, Oct 2, 2012 at 2:52 PM, Mark Davis ☕ m...@macchiato.com wrote: Eg, in http://www.unece.org/fileadmin/DAM/cefact/locode/gr.htm Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico

Re: Announcing The Unicode Standard, Version 6.2

2012-09-26 Thread Mark Davis
BTW, if you want to share the announcement: - Google+: https://plus.sandbox.google.com/u/0/109412260435993059737/posts (I also reposted at with my personal accounthttps://plus.google.com/114199149796022210033 .) - Facebook:

<    1   2   3   4   5   6   7   8   9   10   >