Question about U+170D, which I hope will become TAGALOG LETTER RA

2019-06-11 Thread Fred Brennan via Unicode
Greetings, I write this letter with questions regarding a proposal I hope to make for the encoding of TAGALOG LETTER RA, which we locally know as the baybayin letter "ra", at U+170D. Many fonts are already using this unencoded codepoint for TAGALOG LETTER RA in breach of the standard. TAGALOG

Re: Update to the second question summary (was: A sign/abbreviation for "magister")

2018-12-02 Thread Hans Åberg via Unicode
> On 2 Dec 2018, at 20:29, Janusz S. Bień via Unicode > wrote: > > On Sun, Dec 02 2018 at 10:33 +0100, Hans Åberg via Unicode wrote: >> >> It was common in the 1800s to singly and doubly underline superscript >> abbreviations in handwriting according to [1-2], and [2] also mentions >> the

Update to the second question summary (was: A sign/abbreviation for "magister")

2018-12-02 Thread Janusz S. Bień via Unicode
On Sun, Dec 02 2018 at 10:33 +0100, Hans Åberg via Unicode wrote: >> On 30 Oct 2018, at 22:50, Ken Whistler via Unicode >> wrote: >> >> On 10/30/2018 2:32 PM, James Kass via Unicode wrote: >>> but we can't seem to agree on how to encode its abbreviation. >> >> For what it's worth, "mgr" seems

Preformatted superscript in ordinary text, paleography and phonetics using Latin script (was: Re: A sign/abbreviation for "magister" - third question summary)

2018-11-07 Thread Marcel Schneider via Unicode
) as meaning "Magister". [...] The third and the last question is: how to encode this symbol in Unicode? A constructive answer to my question was provided quickly by James Kass: On Sat, Oct 27 2018 at 19:52 GMT, James Kass via Unicode wrote: Mr͇ / M=ͬ I answered: On Sun, Oct 28 201

A sign/abbreviation for "magister" - third question summary

2018-11-06 Thread Janusz S. Bień via Unicode
ister". > [...] > The third and the last question is: how to encode this symbol in > Unicode? A constructive answer to my question was provided quickly by James Kass: On Sat, Oct 27 2018 at 19:52 GMT, James Kass via Unicode wrote: > Mr͇ / M=ͬ I answered: On Sun, Oct 28 2018 at

A sign/abbreviation for "magister" - second question summary

2018-11-06 Thread Janusz S. Bień via Unicode
gister". [...] > The second question is: are you familiar with such or a similar symbol? > Have you ever seen it in print? Later I provided some additional information: On Sat, Oct 27 2018 at 16:09 +0200, Janusz S. Bień via Unicode wrote: > > The postcard is from the front of

A sign/abbreviation for "magister" - first question summary

2018-11-06 Thread Janusz S. Bień via Unicode
On Sat, Oct 27 2018 at 14:10 +0200, Janusz S. Bień via Unicode wrote: > Hi! > > On the over 100 years old postcard > > https://photos.app.goo.gl/GbwNwYbEQMjZaFgE6 > > you can see 2 occurences of a symbol which is explicitely explained (in > Polish) as meaning "Magister

Re: Shortcuts question

2018-09-17 Thread Philippe Verdy via Unicode
Note: CLDR concentrates on keyboard layout for text input. Layouts for other functions (such as copy-pasting, gaming controls) are completely different (and not necessarily bound directly to layouts for text, as they may also have their own dedicated physical keys or users can reprogram their

Re: Shortcuts question

2018-09-17 Thread Marcel Schneider via Unicode
On 17/09/18 05:38 Martin J. Dürst wrote: [quote] > > From my personal experience: A few years ago, installing a Dvorak > keyboard (which is what I use every day for typing) didn't remap the > control keys, so that Ctrl-C was still on the bottom row of the left > hand, and so on. For me, it was

Re: Shortcuts question

2018-09-16 Thread Martin J. Dürst via Unicode
On 2018/09/16 21:08, Marcel Schneider via Unicode wrote: An additional level of complexity is induced by ergonomics. so that most non-Latin layouts may wish to stick with QWERTY, and even ergonomic layouts in the footprints of August Dvorak rather than Shai Coleman are likely to offer

Re: Shortcuts question

2018-09-16 Thread Philippe Verdy via Unicode
For games, the mnemonic meaning of keys are unlikely to be used because gamers prefer an ergonomic placement of their fingers according to the physical position for essential commands. But this won't apply to control keys, as these commands should be single keystrokes and pressing two keys instead

Re: Shortcuts question

2018-09-16 Thread Marcel Schneider via Unicode
On 15/09/18 15:36, Philippe Verdy wrote: […] > So yes all control keys are potentially localisable to work best with the > base layout anre remaining mnemonic; > but the physical key position may be very different. An additional level of complexity is induced by ergonomics. so that most

Re: Shortcuts question

2018-09-15 Thread Philippe Verdy via Unicode
Le ven. 7 sept. 2018 à 05:43, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 07/09/18 02:32 Shriramana Sharma via Unicode wrote: > > > > Hello. This may be slightly OT for this list but I'm asking it here as > it concerns computer usage with multiple scripts and i18n: > > It

Re: Shortcuts question (is: Thread transfer info)

2018-09-07 Thread Marcel Schneider via Unicode
Hello, I’ve followed up on CLDR-users: https://unicode.org/pipermail/cldr-users/2018-September/000837.html As a sidenote — It might be hard to get a selection of discussions actually happen on CLDR-users instead of Unicode Public mail list, as long as subscribers of this list don’t

Re: Shortcuts question

2018-09-07 Thread Christoph Päper via Unicode
Shriramana Sharma: > > 1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for > "tout" io Ctrl+A for "all"? Some are, many are not. For instance, some text editors use a modifier key with F and K instead of B and I for bold ("fett") and italic ("kursiv"). > 2) How about when the

Re: Shortcuts question

2018-09-06 Thread Marcel Schneider via Unicode
r the Y key > which is in the physical position of the QWERTY Z key (and close to the other > XCV shortcuts)? On Windows, that this question refers to, virtual keys move around with graphics on Latin keyboards. While Ctrl+Z on QWERTZ is not handy, I can tell that it is Ctrl+Z on AZERTY with the key h

Shortcuts question

2018-09-06 Thread Shriramana Sharma via Unicode
Hello. This may be slightly OT for this list but I'm asking it here as it concerns computer usage with multiple scripts and i18n: 1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for "tout" io Ctrl+A for "all"? 2) How about when the shortcuts are the Alt+ combinations referring

Re: Question about Karabakh Characters

2017-10-05 Thread Michael Everson via Unicode
eed these characters. > So I decided to make a post. > > Kazunari Tsuboi > > -Original Message- > From: Michael Everson [mailto:ever...@evertype.com] > Sent: Wednesday, October 4, 2017 11:31 PM > To: Tsuboi, Kazunari > Cc: unicode Unicode Discussion > Subject

RE: Question about Karabakh Characters

2017-10-04 Thread via Unicode
, Kazunari Cc: unicode Unicode Discussion Subject: Re: Question about Karabakh Characters They are not encoded, but that example is not sufficient. If you’d like to contact me offline we can discuss this further. Michael Everson > On 4 Oct 2017, at 08:39, via Unicode <unicode@unicode.org> wrote

Re: Question about Karabakh Characters

2017-10-04 Thread Michael Everson via Unicode
They are not encoded, but that example is not sufficient. If you’d like to contact me offline we can discuss this further. Michael Everson > On 4 Oct 2017, at 08:39, via Unicode wrote: > > Hi there, > > The Karabakh language uses Armenian characters, but the following

Question about Karabakh Characters

2017-10-04 Thread via Unicode
Hi there, The Karabakh language uses Armenian characters, but the following characters do not have a Unicode assigned. (image1.JPG attached) They are pronounced "Yi", "Ini" and "Eh" and used with several combinations. (Image2.JPG attached) Is there any reason these characters are not supported

Re: XCCS (was: Historical question about 'universal signs')

2016-10-24 Thread seth erickson
See pg. 57-63 of this: Xerox. (1985). *Xerox System Network Architecture: General Information Manua*l (No. XNSG 068504). Retrieved from http://archive.org/details/bitsavers_xeroxxnsXNNetworkArchitectureGeneralInformationMan_10024221 SE On Sun, Oct 23, 2016 at 10:01 AM, Doug Ewell

XCCS (was: Historical question about 'universal signs')

2016-10-23 Thread Doug Ewell
seth erickson wrote: XCCS is fairly well documented That hasn't been my experience. I'd be interested in any links you can forward that go beyond "Unicode built on" or "drew ideas from" or "was influenced by" XCCS. Thanks, -- Doug Ewell | Thornton, CO, US | ewellic.org

Historical question about 'universal signs'

2016-10-21 Thread seth erickson
Greetings Unicoders, I'm trying to find information (for research purposes) about a character set mentioned in Joseph Becker's 1988 draft proposal [1]: "In 1978, the initial proposal for a set of 'Universal Signs' was made by Bob Belleville at Xerox PARC. Many persons contributed ideas to the

Re: Question about Perl5 extended UTF-8 design

2015-11-06 Thread Karl Williamson
On 11/06/2015 01:32 PM, Richard Wordingham wrote: On Thu, 05 Nov 2015 13:41:42 -0700 "Doug Ewell" wrote: Richard Wordingham wrote: No-one's claiming it is for a Unicode Transformation Format (UTF). Then they ought not to call it "UTF-8" or "extended" or "modified" UTF-8,

Re: Question about Perl5 extended UTF-8 design

2015-11-06 Thread Otto Stolz
Am 05.11.2015 um 23:11 schrieb Ilya Zakharevich: First of all, “reserved” means that they have no meaning. Right? Almost. “Reserved” means that they have currently no meaning but may be assigned a meaning, later; hence you ought not use them lest your programs, or data, be invalidated by

Re: Question about Perl5 extended UTF-8 design

2015-11-06 Thread Richard Wordingham
On Thu, 05 Nov 2015 13:41:42 -0700 "Doug Ewell" wrote: > Richard Wordingham wrote: > > > No-one's claiming it is for a Unicode Transformation Format (UTF). > > Then they ought not to call it "UTF-8" or "extended" or "modified" > UTF-8, or anything of the sort, even if the

Re: Question about Perl5 extended UTF-8 design

2015-11-05 Thread Philippe Verdy
It won't represent any valid Unicode codepoint (no standard scalar value defined), so if you use those leading bytes, don't pretend it is for "UTF-8" (not even "modified UTF-8" which is the variant created in Java for its internal serialization of unrestricted 16-bit strings, including for lone

Question about Perl5 extended UTF-8 design

2015-11-05 Thread Karl Williamson
Hi, Several of us are wondering about the reason for reserving bits for the extended UTF-8 in perl5. I'm asking you because you are the apparent author of the commits that did this. To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the length of the sequence of bytes that

Re: Question about Perl5 extended UTF-8 design

2015-11-05 Thread Markus Scherer
On Thu, Nov 5, 2015 at 9:25 AM, Philippe Verdy wrote: > (0xFF was reserved only in the old RFC version of UTF-8 when it allowed > code points up to 31 bits, but even this RFC is obsolete and should no > longer be used and it has never been approved by Unicode). > No, even in

Re: Question about Perl5 extended UTF-8 design

2015-11-05 Thread Richard Wordingham
On Thu, 5 Nov 2015 18:25:05 +0100 Philippe Verdy wrote: > But these extra code points could be used to represent someting else > such as unique object identifier for internal use in your > application, or virtual object pointers, or or shared memory block > handles,

Re: Question about Perl5 extended UTF-8 design

2015-11-05 Thread Doug Ewell
Richard Wordingham wrote: > No-one's claiming it is for a Unicode Transformation Format (UTF). Then they ought not to call it "UTF-8" or "extended" or "modified" UTF-8, or anything of the sort, even if the bit-shifting algorithm is based on UTF-8. "UTF-8 encoding form" is defined as a mapping

Re: Question about Perl5 extended UTF-8 design

2015-11-05 Thread Ilya Zakharevich
On Thu, Nov 05, 2015 at 08:57:16AM -0700, Karl Williamson wrote: > Several of us are wondering about the reason for reserving bits for > the extended UTF-8 in perl5. I'm asking you because you are the > apparent author of the commits that did this. To start, the INTERNAL REPRESENTATION of Perl’s

Re: Question about Perl5 extended UTF-8 design

2015-11-05 Thread Philippe Verdy
2015-11-05 23:11 GMT+01:00 Ilya Zakharevich wrote > > • 128-bit architectures may be at hand (sooner or later). This is specialation for something that is still not envisioned: a global worldwide working space where users and applications would interoperate

Re: Question about the Sentence_Break property

2015-02-21 Thread Karl Williamson
On 02/20/2015 04:56 PM, Philippe Verdy wrote: 2015-02-20 6:14 GMT+01:00 Richard Wordingham richard.wording...@ntlworld.com mailto:richard.wording...@ntlworld.com: TUS has a whole section on the issue, namely TUS 7.0.0 Section 5.8. One thing that is missing is mention of the convention

Re: Question about the Sentence_Break property

2015-02-19 Thread Richard Wordingham
On Thu, 19 Feb 2015 19:55:20 -0700 Karl Williamson pub...@khwilliamson.com wrote: UAX 29 says this: Break after paragraph separators. SB4. Sep | CR | LF Why are CR and LF considered to be paragraph separators? NEL and Line Break are as well. My mental model of plain text has it

Question about the Sentence_Break property

2015-02-19 Thread Karl Williamson
UAX 29 says this: Break after paragraph separators. SB4.Sep | CR | LF Why are CR and LF considered to be paragraph separators? NEL and Line Break are as well. My mental model of plain text has it containing embedded characters, which I'll call \n, to allow it to be displayed in a

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-10 Thread Steffen Nurpmeso
Philippe Verdy verd...@wanadoo.fr wrote: |glibc is not more borken and any other C library implementing toupper and |tolower from the legacy ctype standard library. These are old APIs that |are just widely used and still have valid contexts were they are simple and |safe to use. But they are

Re: Question about Uppercase in DerivedCoreProperties.txt

2014-11-10 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: glibc is not more borken and any other C library implementing toupper and tolower from the legacy ctype standard library. These are old APIs that are just widely used and still have valid contexts were they are simple and safe to use.

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-10 Thread Philippe Verdy
Successors to convert strings instead of just isolated characters (sorry, they are NOT what we need to handle texts, they are not even equivalent to Unicode characters, they are just code units, most often 8-bit with char or 16-bit only with wchar_t !) already exist in all C libraries (including

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-10 Thread Philippe Verdy
The equivalent of strtolower() and strtoupper() is implemented in all C libraries I know (yes, including glibc) and I have worked with on various OSes (and since very long!), even if their names change (because of the unfortunate lack of standardization about their interaction with C locales).

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-10 Thread Steffen Nurpmeso
Philippe Verdy verd...@wanadoo.fr wrote: |Successors to convert strings instead of just isolated characters (sorry, |they are NOT what we need to handle texts, they are not even equivalent |to Unicode characters, they are just code units, most often 8-bit with |char or 16-bit only with wchar_t

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-10 Thread Steffen Nurpmeso
Philippe Verdy verd...@wanadoo.fr wrote: |The standard C++ string package could have then used this standard |internally in the methods exposed in its API. I cannot understand this |simple effort was never done on such basic functionality needed and used in |almost all softwares and OSes.

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-08 Thread Mike FABIAN
Philippe Verdy verd...@wanadoo.fr さんはかきました: note that tolower() and toupper() can only work one 1-character level, it is not recommended for use for changing case of plain text. For correct handling of locales, to upper and toupper should be replaced by strtolower and strtoupper (or their

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-08 Thread Philippe Verdy
Do not try to get consisant results with only a character to character mapping, it does not work with all letters, because sometimes you need 1-2 or 2-1 mappings (not all composable characters exist in precombined forms, or sometimes the combination must be split into its canonical decomposed

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-08 Thread Christopher Vance
So glibc is broken. This doesn't make it a Unicode problem. On Sat, Nov 8, 2014 at 8:22 PM, Mike FABIAN mfab...@redhat.com wrote: Philippe Verdy verd...@wanadoo.fr さんはかきました: note that tolower() and toupper() can only work one 1-character level, it is not recommended for use for changing

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-08 Thread Philippe Verdy
glibc is not more borken and any other C library implementing toupper and tolower from the legacy ctype standard library. These are old APIs that are just widely used and still have valid contexts were they are simple and safe to use. But they are not meant to convert text. The i18n data just

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-07 Thread Philippe Verdy
note that tolower() and toupper() can only work one 1-character level, it is not recommended for use for changing case of plain text. Its purpose should be limited to use cases where letters can be safely isolated from their context, for example when handling letters as numbers (e.g. section

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-07 Thread Mike FABIAN
or as ᾈ. ᾈ is something like Ἀι so I understand now that ᾈ can be considered as titlecase (gc=Lt). Thank you very much, Phillipe and Laurentiu for explaining! I stumbled on this question because I am trying to update the character class data for glibc for Unicode 7.0.0. glibc has character classes

Question about Uppercase in DerivedCoreProperties.txt

2014-11-06 Thread Mike FABIAN
I have a question about “Uppercase” in DerivedCoreProperties.txt: U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI is listed as “Lowercase” in http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt : 1F80..1F87; Lowercase # L [8] GREEK SMALL LETTER ALPHA

Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-06 Thread Mike FABIAN
I have a question about “Uppercase” in DerivedCoreProperties.txt: U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI is listed as “Lowercase” in http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt : 1F80..1F87; Lowercase # L [8] GREEK SMALL LETTER ALPHA

Re: Question about “Uppercase” in DerivedCoreProperties.txt

2014-11-06 Thread Philippe Verdy
' sign which originates from the et ligature, or the German umlaut which inherits some old behavior of the superscripted small latin letter e behaving like the Greek iota script in Fraktur font styles) 2014-11-06 16:55 GMT+01:00 Mike FABIAN maiku.fab...@gmail.com: I have a question about “Uppercase

RE: Question about Uppercase in DerivedCoreProperties.txt

2014-11-06 Thread Laurentiu Iancu
, L. -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Mike FABIAN Sent: Thursday, November 6, 2014 12:32 AM To: unicode@unicode.org Subject: Question about Uppercase in DerivedCoreProperties.txt I have a question about “Uppercase

Question about a Normalization test

2014-10-23 Thread Aaron Cannon
Hi all, from the latest version of the standard, on line 16977 of the normalization tests, I am a bit confused by the NFC form. It appears incorrect to me. Here's the line, sans comment: 0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE 0305 0300 0315 0062;0061 05AE 0305

Re: Question about a Normalization test

2014-10-23 Thread Mark Davis ☕️
On Thu, Oct 23, 2014 at 6:54 PM, Aaron Cannon cann...@fireantproductions.com wrote: 0061 05AE 0305 0300 0315 0062 http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cu0061+%5Cu05AE+%5Cu0305+%5Cu0300+%5Cu0315+%5Cu0062g=ccc ​0305 and 0300 have the same ccc, so the first one blocks the

RE: Question about a Normalization test

2014-10-23 Thread Whistler, Ken
Aaron Cannon asked: Hi all, from the latest version of the standard, on line 16977 of the normalization tests, I am a bit confused by the NFC form. It appears incorrect to me. Here's the line, sans comment: 0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE 0305

Re: Question about a Normalization test

2014-10-23 Thread Aaron Cannon
On 10/23/14, Whistler, Ken ken.whist...@sap.com wrote: Test cases like this are included in NormalizationTest.txt precisely to ensure that implementations are correctly detecting these sequences where composition is blocked. And I am indeed glad that they are, as I completely missed this small

Question about WordBreak property rules

2014-07-24 Thread Karl Williamson
http://www.unicode.org/draft/reports/tr29/tr29.html#WB6 indicates that there should be no break between the first two letters in the sequence Hebrew_Letter Single_Quote Hebrew_Letter. However, rule 7a just below indicates that there should be no break between a Hebrew_Letter and a

Re: Question about WordBreak property rules

2014-07-24 Thread Karl Williamson
On 07/24/2014 01:38 PM, Karl Williamson wrote: http://www.unicode.org/draft/reports/tr29/tr29.html#WB6 indicates that there should be no break between the first two letters in the sequence Hebrew_Letter Single_Quote Hebrew_Letter. However, rule 7a just below indicates that there should be no

question to Akkadian

2014-05-19 Thread Werner LEMBERG
Folks, I'm trying to find an encoding of the following Akkadian cuneiform: ___ ___ ___ \ / \ / \ / ||| | /| | /| | | \| | \| | ||| |\___ |/ My knowledge of cuneiforms is zero, but I can read Unicode tables :-) However, I

Re: question to Akkadian

2014-05-19 Thread Tom Gewecke
On May 19, 2014, at 8:40 AM, Werner LEMBERG wrote: If I have a cuneiform text, where can I find glyph images to identify them? You might want to specify what you mean by text. A photo of an inscription? Something from a printed book? Because of the considerable variation in glyphs over

Re: question to Akkadian

2014-05-19 Thread Werner LEMBERG
notation) with Unicode, cf. https://en.wikipedia.org/wiki/Hurrian_songs A much better drawing of the tablet can be found here on page 503: http://digital.library.stonybrook.edu/cdm/ref/collection/amar/id/7250 The character in question is the first one on the left after the double line. A nice

Re: question to Akkadian

2014-05-19 Thread Tom Gewecke
On May 19, 2014, at 9:21 AM, Werner LEMBERG wrote: I'm interested in representing one of the so-called Hurrian songs (tablet H.6, containing musical notation) with Unicode, cf. https://en.wikipedia.org/wiki/Hurrian_songs That says it represents qáb, which seems to be a version of Labat

Re: question to Akkadian

2014-05-19 Thread Werner LEMBERG
I'm interested in representing one of the so-called Hurrian songs (tablet H.6, containing musical notation) with Unicode, cf. https://en.wikipedia.org/wiki/Hurrian_songs That says it represents qáb, which seems to be a version of Labat 88, which is U+1218F KAB. Unfortunately none of

Fwd: Terminology question re ASCII

2013-10-29 Thread Christopher Vance
Sorry, should have cc:d the list. Assume original mail was from a list member. -- Forwarded message -- From: Christopher Vance cjsva...@gmail.com Date: 29 October 2013 16:58 Subject: Re: Terminology question re ASCII To: Mark Davis ☕ m...@macchiato.com Of course, once you have 8

Re: Terminology question re ASCII

2013-10-29 Thread Jukka K. Korpela
2013-10-29 6:12, d...@bisharat.net wrote: If one refers to plain ASCII, or plain ASCII text or ... characters, should this be taken strictly as referring to the 7-bit basic characters, or might it encompass characters that might appear in an 8-bit character set (per the so-called extended

Re: Terminology question re ASCII

2013-10-29 Thread David Starner
On Mon, Oct 28, 2013 at 10:38 PM, Mark Davis ☕ m...@macchiato.com wrote: Normally the term ASCII just refers to the 7-bit form. What is sometimes called 8-bit ASCII is the same as ISO Latin 1. If you want to be completely clear, you can say 7-bit ASCII. One of the first hits for 8-bit ASCII on

Re: Terminology question re ASCII

2013-10-29 Thread Philippe Verdy
://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Tue, Oct 29, 2013 at 5:12 AM, d...@bisharat.net wrote: Quick question on terminology use concerning a legacy encoding: If one refers to plain ASCII, or plain ASCII text or ... characters, should this be taken

RE: Terminology question re ASCII

2013-10-29 Thread Shawn Steele
they’re really trying to do because something’s probably a bit confused. -Shawn From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Philippe Verdy Sent: Tuesday, October 29, 2013 7:49 AM To: Mark Davis ☕ Cc: Donald Z. Osborn; unicode Subject: Re: Terminology question re

Re: Terminology question re ASCII

2013-10-29 Thread Philippe Verdy
2013/10/29 Shawn Steele shawn.ste...@microsoft.com I would concur. When I hear “8 bit ASCII” the context is usually confusing the term with any of what we call “ANSI Code Pages” in Windows. (or similar ideas on other systems). Of course not just Windows (or MSDOS). This was seen as well in

Re: Terminology question re ASCII

2013-10-28 Thread Mark Davis ☕
, 2013 at 5:12 AM, d...@bisharat.net wrote: Quick question on terminology use concerning a legacy encoding: If one refers to plain ASCII, or plain ASCII text or ... characters, should this be taken strictly as referring to the 7-bit basic characters, or might it encompass characters that might

Re: UTF-8 ill-formed question

2012-12-16 Thread Otto Stolz
Hello, am 2012-12-15 schrieb Philippe Verdy: But there's still a bug (or request for enhancement) for your Pocket converters : - For UTF-16 you correctly exclude the range U+D800..U+DFFF (surrogates) from the sets of convertible codepoints. - But you don't exclude this range in the case of

Re: UTF-8 ill-formed question

2012-12-16 Thread Philippe Verdy
2012/12/16 Otto Stolz otto.st...@uni-konstanz.de The reason I excluded the surrogates from my UTF-8 MPE was really that I needed additional space for the user’s guide on the reverse side. Why adding a row in the front side would have not preserved the space for the reverse side ? If this is

Re: UTF-8 ill-formed question

2012-12-16 Thread Otto Stolz
Hello, 2012/12/16 Otto Stolz otto.st...@uni-konstanz.de The reason I excluded the surrogates from my UTF-8 MPE was really that I needed additional space for the user’s guide on the reverse side. Sorry, typo; I meant: “my UTF-16 MPE”. I added that extra row (with the branch excluding the

Re: UTF-8 ill-formed question

2012-12-16 Thread Philippe Verdy
But the old Marco design at that time (2002) was still ignoring the Unicode UTF-8 conformance constraints, as demonstrated in its use of the obsolete U-00n notation (mathcing the obsolete ISO/IETF definition). If the puprpose of this pocket conversion card is to be used for tutorial purpose,

Re: UTF-8 ill-formed question

2012-12-16 Thread Doug Ewell
Philippe Verdy wrote: If the puprpose of this pocket conversion card is to be used for tutorial purpose, It never was. It was a quick reference guide for experienced users who already understood the caveats. Not worth arguing further. -- Doug Ewell | Thornton, Colorado, USA

Re: UTF-8 ill-formed question

2012-12-12 Thread Otto Stolz
Hello, am 2012-12-11 20:16, schrieb James Lin: If i have a code point: U+4E8C or 二 In UTF-8, it's E4 BA 8C while in UTF-16, it's 4E8C. Where is this BA comes from? Cf. http://skew.org/cumped/. Enclosed are the (almost original) version of “€œCima’s Magic UTF-8 Pocket encoder”€ (2004), and

Re: UTF-8 ill-formed question

2012-12-11 Thread Asmus Freytag
On 12/11/2012 11:50 AM, vanis...@boil.afraid.org wrote: From: James Lin James_Lin_at_symantec.com Hi Does anyone know why ill-form occurred on the UTF-8? besides it doesn't follow the pattern of UTF-8 byte-sequences, i just wondering how or why? If i have a code point: U+4E8C or 二 In UTF-8,

Re: UTF-8 ill-formed question

2012-12-11 Thread James Lin
thank you so much everyone for explaining it. I got it now! -James On 12/11/12 11:50 AM, vanis...@boil.afraid.org vanis...@boil.afraid.org wrote: From: James Lin James_Lin_at_symantec.com Hi Does anyone know why ill-form occurred on the UTF-8? besides it doesn't follow the pattern of UTF-8

Question about normalization tests

2012-12-10 Thread Edwin Hoogerbeets
; a◌֮◌̅◌̀◌̕b; a◌֮◌̅◌̀◌̕b; ) LATIN SMALL LETTER A, COMBINING OVERLINE, COMBINING COMMA ABOVE RIGHT, COMBINING GRAVE ACCENT, HEBREW ACCENT ZINOR, LATIN SMALL LETTER B The relevant parts for my question are: Source: 0061 0305 0315 0300 05AE 0062 NFD: 0061 05AE 0305 0300 0315 0062 NFC: 0061 05AE 0305

Re: Question about normalization tests

2012-12-10 Thread Mark Davis ☕
0300 *is* blocked, because there is a preceding character (0305) that has the same combining class (230). Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Mon, Dec 10, 2012 at 11:55 AM, Edwin Hoogerbeets ehoogerbe...@gmail.comwrote: Looking at

RE: Question about normalization tests

2012-12-10 Thread Whistler, Ken
and 0062, they are not blocked, but there is no composition with 00E0, so the algorithm ends with the result: 00E0 05AE 0305 0315 0062 This disagrees with what it says in the normalization tests file as listed above. The question is, did I misunderstand the algorithm, or is this perhaps a bug

Fwd: Re: Question about normalization tests

2012-12-10 Thread Edwin Hoogerbeets
Ah yes, I did indeed miss the equal to part. I fixed up my code and now it works as expected. Thanks to Mark and Ken for your help and speedy response! Edwin On 12/10/2012 12:57 PM, Whistler, Ken wrote: Your misunderstanding is at the highlighted statement below. Actually 0300 **is** blocked

A question about the default grapheme cluster boundaries with U+0020 as the grapheme base

2012-06-01 Thread Konstantin Ritt
It seems like there is an inconsistency between what the default grapheme clusters specification says and what the test results are expected to be: The UAX#29 says: Another key feature (of default Unicode grapheme clusters) is that bdefault Unicode grapheme clusters are atomic units with

Re: Question on U+33D7

2012-02-24 Thread Shriramana Sharma
Grandpa grandpa I wanna hear the story about the turtles *now*! :-) Sent from my Android phone

Re: Question on U+33D7

2012-02-24 Thread Matt Ma
On Fri, Feb 24, 2012 at 5:18 AM, Shriramana Sharma samj...@gmail.com wrote: Grandpa grandpa I wanna hear the story about the turtles *now*! :-) Sent from my Android phone Thanks all for the enlightening reply. My intent was sorting using UCA but it really did not matter much because U+33D7

Question on U+33D7

2012-02-23 Thread Matt Ma
It is defined as 33D7;SQUARE PH;So;0;L;square 0050 0048N;SQUARED PH in UnicodeData.txt, but it is shown as pH in code chart. Should it be 0070 0048 or PH? Thanks, Matt

Re: Question on U+33D7

2012-02-23 Thread António Martins-Tuválkin
On 2012/2/23 Matt Ma matt.ma.um...@gmail.com wrote: It is defined as 33D7;SQUARE PH;So;0;L;square 0050 0048N;SQUARED PH in UnicodeData.txt, but it is shown as pH in code chart. Should it be 0070 0048 or PH? It should certainly be pH, i.e., square0070 0048/square, because that's the

Re: Question on U+33D7

2012-02-23 Thread Asmus Freytag
On 2/23/2012 2:44 PM, António Martins-Tuválkin wrote: On 2012/2/23 Matt Mamatt.ma.um...@gmail.com wrote: It is defined as 33D7;SQUARE PH;So;0;L;square 0050 0048N;SQUARED PH in UnicodeData.txt, but it is shown as pH in code chart. Should it be 0070 0048 or PH? It should certainly be

Re: Question on U+33D7

2012-02-23 Thread Ken Whistler
to 0050 0048 instead of to 0070 0048. O.k., folks, I guess it's time for everybody to gather around the fire for another episode of Every Character Has a Story. First, to answer Matt Ma's original question, no, the decomposition should *not* be square 0070 0048. The reason for that is simple

Re: Question on UCA collation parameters (strength = tertiary, alternate = shifted)

2011-12-01 Thread Matt Ma
In addition, the default setting in Table 14, UTS #10, 6.0.0 are strength: tertiary alternative: shifted But the setting won't generate the conformant behavior specified by CollationTest_SHIFTED.txt I think when alternative is set to shifted, strength should be set to quaternary (as

Question on UCA collation parameters (strength = tertiary, alternate = shifted)

2011-11-29 Thread Matt Ma
Hi, Does Shifted implies strength being quaternary? If strength stays as tertiary (default or explicitly set), it seems the collation behavior is Blanked. Please clarify. Thanks, Matt

Re: Question on UCA collation parameters (strength = tertiary, alternate = shifted)

2011-11-29 Thread Matt Ma
Thanks for clarification. But to pass UCA conformance test on Shifted, does the strength have to be set to quaternary? Howeve, it is stated in UCA, C2, A conformant implementation shall support at least three levels of collation. Does this mean a UCA conformant implementation only need pass UCA

RE: Pupil's question about Burmese

2010-11-10 Thread Shawn Steele
] on behalf of Peter Constable [peter...@microsoft.com] Sent: Tuesday, November 09, 2010 10:42 PM To: James Lin; Ed Cc: Unicode Mailing List Subject: RE: Pupil's question about Burmese A non-Unicode web page is like a non-Unicode app. Web pages, and apps, should use Unicode.' Peter -Original

Re: Pupil's question about Burmese

2010-11-10 Thread Keith Stribley
has been a standard code page for Myanmar text, Unicode was the first time storage of Burmese text was standardised for computers. There are several different legacy font families in use for Myanmar each with their own slightly different mapping to Latin code points. The font in question has

Re: Pupil's question about Burmese

2010-11-09 Thread Ngwe Tun
Dear Peter Constable, * Burmese_is_supported in windows.* It makes worse than ever to create another story like pseudo-unicode like Zawgyi in Windows. too. We are in dead-lock because without releasing Myanmar Opentype specifiction for burmese by Microsoft. We can't implement burmese in opentype

Re: Pupil's question about Burmese

2010-11-09 Thread Peter Edberg
Dear Ngwe Tun, The forthcoming ICU 4.6 will include a Burmese locale (using CLDR data), with support for Burmese collation. http://site.icu-project.org/ Best regards, Peter Edberg On Nov 9, 2010, at 2:05 AM, Ngwe Tun wrote: ... We are in dead-lock because without releasing Myanmar

Re: Pupil's question about Burmese

2010-11-09 Thread James Lin
of Windows 7 is, additionally, able to display text in scripts Tifinagh and Tai Le. In all these cases, the system locale setting has no bearing. Yes, displaying is fine, but the original question is copying and pasting; without the correct locale settings, you can¹t copy/paste without corrupting the byte

Re: Pupil's question about Burmese

2010-11-09 Thread Ed
Yes, displaying is fine, but the original question is copying and pasting; without the correct locale settings, you can’t copy/paste without corrupting the byte sizes.  Copy/paste is generally handle by OS itself, not application.  Even if you have unicode support application, you can display

RE: Pupil's question about Burmese

2010-11-09 Thread James Lin
: Pupil's question about Burmese Yes, displaying is fine, but the original question is copying and pasting; without the correct locale settings, you can’t copy/paste without corrupting the byte sizes.  Copy/paste is generally handle by OS itself, not application.  Even if you have unicode support

  1   2   3   4   5   >