Re: Plain text custom fraction input

2015-07-21 Thread Doug Ewell
As explained in TUS 7.0, §6.2 (General Punctuation), p. 273, U+2044 FRACTION SLASH is intended for use with Basic Latin digits, or other digits with General Category = Nd. The superscript and subscript presentation forms have General Category = No. -- Doug Ewell | http://ewellic.org | Thornton,

Emoji: The Movie

2015-07-21 Thread Garth Wallace
I'm not sure if this is a joke or not: http://deadline.com/2015/07/emoji-movie-sony-pictures-animation-anthony-leondis-kung-fu-panda-secrets-of-the-masters-1201482768/

Re: Emoji: The Movie

2015-07-21 Thread Doug Ewell
Garth Wallace gwalla at gmail dot com wrote: I'm not sure if this is a joke or not: Yes. -- Doug Ewell | http://ewellic.org | Thornton, CO 

AW: Security concerns: OGHAM SPACE MARK

2015-07-21 Thread Dreiheller, Albrecht
On Tue, Jul 21, 2015 at 12:46 David Starner [mailto:prosfil...@gmail.com] wrote: On Tue, Jul 21, 2015 at 2:14 AM Dreiheller, Albrecht albrecht.dreihel...@siemens.com wrote: If the author really intends to deceive potential readers he will succeed. Possibly. Code is hard. But the Ogham space is

Re: AW: Security concerns: OGHAM SPACE MARK

2015-07-21 Thread Asmus Freytag (t)
On 7/21/2015 2:55 PM, Dreiheller, Albrecht wrote: Of course, there are

Re: Security concerns: OGHAM SPACE MARK

2015-07-21 Thread David Starner
On Tue, Jul 21, 2015 at 2:55 PM Dreiheller, Albrecht albrecht.dreihel...@siemens.com wrote: My concern is not about the Ogham space, but about the free usage of non-Ascii in programming languages in general. Just imagine, when you decide to open a door for public traffic in busy city with a

Re: Chinese Word Breaking

2015-07-21 Thread Richard Wordingham
On Tue, 21 Jul 2015 18:10:14 +0800 gfb hjjhjh c933...@gmail.com wrote: When you write text in modern Chinese, there will not be any break between different words, and thus if you segment characters according to the ideographic characters, what being groupped together would either be a clausee

Re: Plain text custom fraction input (Part of: Input methods at the age of Unicode)

2015-07-21 Thread Marcel Schneider
Entering fractions in plain text is consistent with the very core of Unicodeʼs purpose, which (please check if Iʼm right) is to empower all people on earth to get in readable plain text as much information as possible.  As fractions, that ISO wanted to stay called “vulgar”, are part of this

Re: Chinese Word Breaking

2015-07-21 Thread gfb hjjhjh
When you write text in modern Chinese, there will not be any break between different words, and thus if you segment characters according to the ideographic characters, what being groupped together would either be a clausee or a sentence, Or even a whole paragraph if you are handling some older

Re: Security concerns: OGHAM SPACE MARK

2015-07-21 Thread David Starner
On Tue, Jul 21, 2015 at 2:14 AM Dreiheller, Albrecht albrecht.dreihel...@siemens.com wrote: If the author really intends to deceive potential readers he will succeed. Possibly. Code is hard. But the Ogham space is not a real threat; it's easy to search for and obviously a deliberate attempt

UTF-8 display (was: Re: a mug)

2015-07-21 Thread Marcel Schneider
On 13 Jul 2015, at 11:28, I wrote: The only time I saw UTF-8 like on the T-shirt, was when opening UTF-8 files that didn't specify charset=UTF-8. The thing to do was to add the charset in the file header. Now I see that this issue is much more tricky. I've just stumbled over a no-display

AW: Security concerns: OGHAM SPACE MARK

2015-07-21 Thread Dreiheller, Albrecht
Allowing arbitrary non-Ascii characters in programming languages will make it more difficult to detect malicious code. If the author really intends to deceive potential readers he will succeed. Programming languages like JS should at least implement exclusion rules from the Unicode Confusables

Chinese Word Breaking

2015-07-21 Thread Richard Wordingham
I'm puzzled by a statement in UAX #29 Unicode Text Segmentation: In particular, the characters with the Line_Break property values of Contingent_Break (CB), Complex_Context (SA/Southeast Asian), and Unknown (XX) are assigned word boundary property values based on criteria outside of the scope of

Re: UTF-8 display (was: Re: a mug)

2015-07-21 Thread Tom Gewecke
The IBM page seems to have an ellipsis character in UTF-8, with bytes E2 80 A6. The web server is set to force all browsers to use the encoding iso-8859-1 regardless of what charset is stipulated in the html code. The browser uses the Win 1252 equivalents and displays … To see what a web