Re: Counting Codepoints

2015-10-12 Thread Richard Wordingham
On Tue, 13 Oct 2015 00:49:29 +0200 Philippe Verdy wrote: > 2015-10-12 21:38 GMT+02:00 Richard Wordingham < > richard.wording...@ntlworld.com>: > > Graceful fallback is exactly where the issue arises. Throwing an > > exception is not a useful answer to the question of how many code > > points a '

Re: Counting Codepoints

2015-10-12 Thread David Starner
Any system that exposes Unicode strings (not UTF-16 strings) cannot have two surrogates merge when two strings are appended. There's nothing in the Unicode standard that says that should happen for a string in an arbitrary format, and it's unreasonable behavior for a string. Thus a Unicode string

RE: Rights to the Emoji

2015-10-12 Thread Peter Constable
Exactly: specific designs are subject to license terms determined by the original designer, which are liberal in some cases and not in others. But the concept of a such-and-such emoji and it's encoded representation are not an issue. Peter -Original Message- From: Unicode [mailto:unic

Re: Counting Codepoints

2015-10-12 Thread Philippe Verdy
2015-10-12 21:38 GMT+02:00 Richard Wordingham < richard.wording...@ntlworld.com>: > On Sun, 11 Oct 2015 21:36:49 -0700 > Ken Whistler wrote: > > > I think the correct answer is probably: > > > > (c) The ill-formed three code unit Unicode 16-bit string > > <0xDC00, 0xD800, 0xDC20> contains one cod

Re: Counting Codepoints

2015-10-12 Thread Richard Wordingham
On Mon, 12 Oct 2015 17:29:13 +0200 Philippe Verdy wrote: > But between two implementations > the result of the scanner could still be different because the > replacement character is not specified. If that result "sanitized" > string is then used to generate an URI, the URI is also unpredictable

Re: Counting Codepoints

2015-10-12 Thread Richard Wordingham
On Sun, 11 Oct 2015 21:36:49 -0700 Ken Whistler wrote: > I think the correct answer is probably: > > (c) The ill-formed three code unit Unicode 16-bit string > <0xDC00, 0xD800, 0xDC20> contains one code point, U+10020 and > one uninterpreted (and uninterpretable) high surrogate > code unit 0xDC0

Re: Counting Codepoints

2015-10-12 Thread Philippe Verdy
2015-10-12 14:42 GMT+02:00 Mark Davis ☕️ : > If these are not all aligned, then all heck breaks loose: you are letting > yourself in for code breakage and/or security problems. > > So the corresponding code point count would just return a count of 1 for > an isolated surrogate. > But the behavior

Re: Counting Codepoints

2015-10-12 Thread Philippe Verdy
Replace U+FFFE by U+FFFD in my message (but there are applications that also prefer using non-characters for those replacements, this is also an additional alternative, as U+FFFE has a valid representation as well in UTF-16). U+FFFD is not the only possible replacement even if it is recommended (by

Re: How can my research become implemented in a standardized manner?

2015-10-12 Thread William_J_G Overington
Bonjour Philippe Thank you for posting. > In fact this is not just inventing new characters, all this personal research > is about inventing a new human language as well ! Actually it is not. An end user would only need to use his or her own language using cascading menus. Everything else would b

Re: Rights to the Emoji

2015-10-12 Thread Nicole Selken
I would contact Apple about it. Many Ads on TV etc... are using this Emoji set. So there must be a way to get access, or they do not care. Thanks, Niki Selken Working on: www.nikiselken.com On Mon, Oct 12, 2015 at 10:07 AM, S

Re: How can my research become implemented in a standardized manner?

2015-10-12 Thread William_J_G Overington
> I believe using markup languages would be a better approach than getting some > new character. Thank you for posting. That would make an interesting discussion, yet is off-topic for this thread. The topic for this thread is about the encoding process, not about the merits or otherwise of the pa

Re: Rights to the Emoji

2015-10-12 Thread Shervin Afshar
Twemoji are Open Source, but published under CC-BY and that license requires attribution which might be challenging in this specific use case. On Oct 11, 2015 10:46 PM, "Mark Davis ☕️" wrote: > The twitter images are open sourced, I believe. > > {phone} > On Oct 12, 2015 02:56, "Shervin Afshar"

Re: Counting Codepoints

2015-10-12 Thread Mark Davis ☕️
I agree with Ken on "Any discussion about properties for surrogate code points is a matter of designing graceful API fallback for instances which have to deal with ill-formed strings and do *something*.", and here's be my advice based on that. You want the code point count to reflect the same coun

Re: How can my research become implemented in a standardized manner?

2015-10-12 Thread gfb hjjhjh
This proposal is, in my opinion, similar to another discussion about giving unicode character to food allergy symbol that happened few months ago on this mailing list, which both idea want to use unicode characters to overcome language barrier, just that that proposal were about those icon while th