Greetings,
I write this letter with questions regarding a proposal I hope to make for the
encoding of TAGALOG LETTER RA, which we locally know as the baybayin letter
"ra", at U+170D. Many fonts are already using this unencoded codepoint for
TAGALOG LETTER RA in breach of the standard. TAGALOG
> On 2 Dec 2018, at 20:29, Janusz S. Bień via Unicode
> wrote:
>
> On Sun, Dec 02 2018 at 10:33 +0100, Hans Åberg via Unicode wrote:
>>
>> It was common in the 1800s to singly and doubly underline superscript
>> abbreviations in handwriting according to [1-2], and [2] also mentions
>> the
On Sun, Dec 02 2018 at 10:33 +0100, Hans Åberg via Unicode wrote:
>> On 30 Oct 2018, at 22:50, Ken Whistler via Unicode
>> wrote:
>>
>> On 10/30/2018 2:32 PM, James Kass via Unicode wrote:
>>> but we can't seem to agree on how to encode its abbreviation.
>>
>> For what it's worth, "mgr" seems
) as meaning "Magister".
[...]
The third and the last question is: how to encode this symbol in
Unicode?
A constructive answer to my question was provided quickly by James Kass:
On Sat, Oct 27 2018 at 19:52 GMT, James Kass via Unicode wrote:
Mr͇ / M=ͬ
I answered:
On Sun, Oct 28 201
ister".
>
[...]
> The third and the last question is: how to encode this symbol in
> Unicode?
A constructive answer to my question was provided quickly by James Kass:
On Sat, Oct 27 2018 at 19:52 GMT, James Kass via Unicode wrote:
> Mr͇ / M=ͬ
I answered:
On Sun, Oct 28 2018 at
gister".
[...]
> The second question is: are you familiar with such or a similar symbol?
> Have you ever seen it in print?
Later I provided some additional information:
On Sat, Oct 27 2018 at 16:09 +0200, Janusz S. Bień via Unicode wrote:
>
> The postcard is from the front of
On Sat, Oct 27 2018 at 14:10 +0200, Janusz S. Bień via Unicode wrote:
> Hi!
>
> On the over 100 years old postcard
>
> https://photos.app.goo.gl/GbwNwYbEQMjZaFgE6
>
> you can see 2 occurences of a symbol which is explicitely explained (in
> Polish) as meaning "Magister
Note: CLDR concentrates on keyboard layout for text input. Layouts for
other functions (such as copy-pasting, gaming controls) are completely
different (and not necessarily bound directly to layouts for text, as they
may also have their own dedicated physical keys or users can reprogram
their
On 17/09/18 05:38 Martin J. Dürst wrote:
[quote]
>
> From my personal experience: A few years ago, installing a Dvorak
> keyboard (which is what I use every day for typing) didn't remap the
> control keys, so that Ctrl-C was still on the bottom row of the left
> hand, and so on. For me, it was
On 2018/09/16 21:08, Marcel Schneider via Unicode wrote:
An additional level of complexity is induced by ergonomics. so that most
non-Latin layouts may wish to stick
with QWERTY, and even ergonomic layouts in the footprints of August Dvorak
rather than Shai Coleman are
likely to offer
For games, the mnemonic meaning of keys are unlikely to be used because
gamers prefer an ergonomic placement of their fingers according to the
physical position for essential commands.
But this won't apply to control keys, as these commands should be single
keystrokes and pressing two keys instead
On 15/09/18 15:36, Philippe Verdy wrote:
[…]
> So yes all control keys are potentially localisable to work best with the
> base layout anre remaining mnemonic;
> but the physical key position may be very different.
An additional level of complexity is induced by ergonomics. so that most
Le ven. 7 sept. 2018 à 05:43, Marcel Schneider via Unicode <
unicode@unicode.org> a écrit :
> On 07/09/18 02:32 Shriramana Sharma via Unicode wrote:
> >
> > Hello. This may be slightly OT for this list but I'm asking it here as
> it concerns computer usage with multiple scripts and i18n:
>
> It
Hello,
I’ve followed up on CLDR-users:
https://unicode.org/pipermail/cldr-users/2018-September/000837.html
As a sidenote — It might be hard to get a selection of discussions
actually happen on CLDR-users instead of Unicode Public mail list,
as long as subscribers of this list don’t
Shriramana Sharma:
>
> 1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for
> "tout" io Ctrl+A for "all"?
Some are, many are not. For instance, some text editors use a modifier key with
F and K instead of B and I for bold ("fett") and italic ("kursiv").
> 2) How about when the
r the Y key
> which is in the physical position of the QWERTY Z key (and close to the other
> XCV shortcuts)?
On Windows, that this question refers to, virtual keys move around with
graphics on Latin keyboards. While Ctrl+Z on QWERTZ is
not handy, I can tell that it is Ctrl+Z on AZERTY with the key h
Hello. This may be slightly OT for this list but I'm asking it here as it
concerns computer usage with multiple scripts and i18n:
1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for
"tout" io Ctrl+A for "all"?
2) How about when the shortcuts are the Alt+ combinations referring
eed these characters.
> So I decided to make a post.
>
> Kazunari Tsuboi
>
> -Original Message-
> From: Michael Everson [mailto:ever...@evertype.com]
> Sent: Wednesday, October 4, 2017 11:31 PM
> To: Tsuboi, Kazunari
> Cc: unicode Unicode Discussion
> Subject
, Kazunari
Cc: unicode Unicode Discussion
Subject: Re: Question about Karabakh Characters
They are not encoded, but that example is not sufficient. If you’d like to
contact me offline we can discuss this further.
Michael Everson
> On 4 Oct 2017, at 08:39, via Unicode <unicode@unicode.org> wrote
They are not encoded, but that example is not sufficient. If you’d like to
contact me offline we can discuss this further.
Michael Everson
> On 4 Oct 2017, at 08:39, via Unicode wrote:
>
> Hi there,
>
> The Karabakh language uses Armenian characters, but the following
Hi there,
The Karabakh language uses Armenian characters, but the following characters do
not have a Unicode assigned. (image1.JPG attached)
They are pronounced "Yi", "Ini" and "Eh" and used with several combinations.
(Image2.JPG attached)
Is there any reason these characters are not supported
See pg. 57-63 of this:
Xerox. (1985). *Xerox System Network Architecture: General Information
Manua*l (No. XNSG 068504). Retrieved from
http://archive.org/details/bitsavers_xeroxxnsXNNetworkArchitectureGeneralInformationMan_10024221
SE
On Sun, Oct 23, 2016 at 10:01 AM, Doug Ewell
seth erickson wrote:
XCCS is fairly well documented
That hasn't been my experience. I'd be interested in any links you can
forward that go beyond "Unicode built on" or "drew ideas from" or "was
influenced by" XCCS.
Thanks,
--
Doug Ewell | Thornton, CO, US | ewellic.org
Greetings Unicoders,
I'm trying to find information (for research purposes) about a character
set mentioned in Joseph Becker's 1988 draft proposal [1]:
"In 1978, the initial proposal for a set of 'Universal Signs' was made by
Bob Belleville at Xerox PARC. Many persons contributed ideas to the
On 11/06/2015 01:32 PM, Richard Wordingham wrote:
On Thu, 05 Nov 2015 13:41:42 -0700
"Doug Ewell" wrote:
Richard Wordingham wrote:
No-one's claiming it is for a Unicode Transformation Format (UTF).
Then they ought not to call it "UTF-8" or "extended" or "modified"
UTF-8,
Am 05.11.2015 um 23:11 schrieb Ilya Zakharevich:
First of all, “reserved” means that they have no meaning. Right?
Almost.
“Reserved” means that they have currently no meaning
but may be assigned a meaning, later; hence you ought
not use them lest your programs, or data, be invalidated
by
On Thu, 05 Nov 2015 13:41:42 -0700
"Doug Ewell" wrote:
> Richard Wordingham wrote:
>
> > No-one's claiming it is for a Unicode Transformation Format (UTF).
>
> Then they ought not to call it "UTF-8" or "extended" or "modified"
> UTF-8, or anything of the sort, even if the
It won't represent any valid Unicode codepoint (no standard scalar value
defined), so if you use those leading bytes, don't pretend it is for
"UTF-8" (not even "modified UTF-8" which is the variant created in Java for
its internal serialization of unrestricted 16-bit strings, including for
lone
Hi,
Several of us are wondering about the reason for reserving bits for the
extended UTF-8 in perl5. I'm asking you because you are the apparent
author of the commits that did this.
To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the
length of the sequence of bytes that
On Thu, Nov 5, 2015 at 9:25 AM, Philippe Verdy wrote:
> (0xFF was reserved only in the old RFC version of UTF-8 when it allowed
> code points up to 31 bits, but even this RFC is obsolete and should no
> longer be used and it has never been approved by Unicode).
>
No, even in
On Thu, 5 Nov 2015 18:25:05 +0100
Philippe Verdy wrote:
> But these extra code points could be used to represent someting else
> such as unique object identifier for internal use in your
> application, or virtual object pointers, or or shared memory block
> handles,
Richard Wordingham wrote:
> No-one's claiming it is for a Unicode Transformation Format (UTF).
Then they ought not to call it "UTF-8" or "extended" or "modified"
UTF-8, or anything of the sort, even if the bit-shifting algorithm is
based on UTF-8.
"UTF-8 encoding form" is defined as a mapping
On Thu, Nov 05, 2015 at 08:57:16AM -0700, Karl Williamson wrote:
> Several of us are wondering about the reason for reserving bits for
> the extended UTF-8 in perl5. I'm asking you because you are the
> apparent author of the commits that did this.
To start, the INTERNAL REPRESENTATION of Perl’s
2015-11-05 23:11 GMT+01:00 Ilya Zakharevich wrote
>
> • 128-bit architectures may be at hand (sooner or later).
This is specialation for something that is still not envisioned: a global
worldwide working space where users and applications would interoperate
On 02/20/2015 04:56 PM, Philippe Verdy wrote:
2015-02-20 6:14 GMT+01:00 Richard Wordingham
richard.wording...@ntlworld.com mailto:richard.wording...@ntlworld.com:
TUS has a whole section on the issue, namely TUS 7.0.0 Section 5.8.
One thing that is missing is mention of the convention
On Thu, 19 Feb 2015 19:55:20 -0700
Karl Williamson pub...@khwilliamson.com wrote:
UAX 29 says this:
Break after paragraph separators.
SB4. Sep | CR | LF
Why are CR and LF considered to be paragraph separators? NEL and
Line Break are as well.
My mental model of plain text has it
UAX 29 says this:
Break after paragraph separators.
SB4.Sep | CR | LF
Why are CR and LF considered to be paragraph separators? NEL and Line
Break are as well.
My mental model of plain text has it containing embedded characters,
which I'll call \n, to allow it to be displayed in a
Philippe Verdy verd...@wanadoo.fr wrote:
|glibc is not more borken and any other C library implementing toupper and
|tolower from the legacy ctype standard library. These are old APIs that
|are just widely used and still have valid contexts were they are simple and
|safe to use. But they are
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
glibc is not more borken and any other C library implementing toupper
and tolower from the legacy ctype standard library. These are old
APIs that are just widely used and still have valid contexts were they
are simple and safe to use.
Successors to convert strings instead of just isolated characters (sorry,
they are NOT what we need to handle texts, they are not even equivalent
to Unicode characters, they are just code units, most often 8-bit with
char or 16-bit only with wchar_t !) already exist in all C libraries
(including
The equivalent of strtolower() and strtoupper() is implemented in all C
libraries I know (yes, including glibc) and I have worked with on various
OSes (and since very long!), even if their names change (because of the
unfortunate lack of standardization about their interaction with C locales).
Philippe Verdy verd...@wanadoo.fr wrote:
|Successors to convert strings instead of just isolated characters (sorry,
|they are NOT what we need to handle texts, they are not even equivalent
|to Unicode characters, they are just code units, most often 8-bit with
|char or 16-bit only with wchar_t
Philippe Verdy verd...@wanadoo.fr wrote:
|The standard C++ string package could have then used this standard
|internally in the methods exposed in its API. I cannot understand this
|simple effort was never done on such basic functionality needed and used in
|almost all softwares and OSes.
Philippe Verdy verd...@wanadoo.fr さんはかきました:
note that tolower() and toupper() can only work one 1-character level, it
is not recommended for use for changing case of plain text.
For correct handling of locales, to upper and toupper should be replaced by
strtolower and strtoupper (or their
Do not try to get consisant results with only a character to character
mapping, it does not work with all letters, because sometimes you need 1-2
or 2-1 mappings (not all composable characters exist in precombined forms,
or sometimes the combination must be split into its canonical decomposed
So glibc is broken. This doesn't make it a Unicode problem.
On Sat, Nov 8, 2014 at 8:22 PM, Mike FABIAN mfab...@redhat.com wrote:
Philippe Verdy verd...@wanadoo.fr さんはかきました:
note that tolower() and toupper() can only work one 1-character level, it
is not recommended for use for changing
glibc is not more borken and any other C library implementing toupper and
tolower from the legacy ctype standard library. These are old APIs that
are just widely used and still have valid contexts were they are simple and
safe to use. But they are not meant to convert text.
The i18n data just
note that tolower() and toupper() can only work one 1-character level, it
is not recommended for use for changing case of plain text. Its purpose
should be limited to use cases where letters can be safely isolated from
their context, for example when handling letters as numbers (e.g. section
or as ᾈ.
ᾈ is something like Ἀι so I understand now that ᾈ can be considered as
titlecase (gc=Lt).
Thank you very much, Phillipe and Laurentiu for explaining!
I stumbled on this question because I am trying to update the character
class data for glibc for Unicode 7.0.0.
glibc has character classes
I have a question about “Uppercase” in DerivedCoreProperties.txt:
U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
is listed as “Lowercase” in
http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt :
1F80..1F87; Lowercase # L [8] GREEK SMALL LETTER ALPHA
I have a question about “Uppercase” in DerivedCoreProperties.txt:
U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
is listed as “Lowercase” in
http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt :
1F80..1F87; Lowercase # L [8] GREEK SMALL LETTER ALPHA
' sign which originates from the et ligature, or the
German umlaut which inherits some old behavior of the superscripted small
latin letter e behaving like the Greek iota script in Fraktur font styles)
2014-11-06 16:55 GMT+01:00 Mike FABIAN maiku.fab...@gmail.com:
I have a question about “Uppercase
,
L.
-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Mike FABIAN
Sent: Thursday, November 6, 2014 12:32 AM
To: unicode@unicode.org
Subject: Question about Uppercase in DerivedCoreProperties.txt
I have a question about “Uppercase
Hi all, from the latest version of the standard, on line 16977 of the
normalization tests, I am a bit confused by the NFC form. It appears
incorrect to me. Here's the line, sans comment:
0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE
0305 0300 0315 0062;0061 05AE 0305
On Thu, Oct 23, 2014 at 6:54 PM, Aaron Cannon
cann...@fireantproductions.com wrote:
0061 05AE 0305 0300 0315 0062
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cu0061+%5Cu05AE+%5Cu0305+%5Cu0300+%5Cu0315+%5Cu0062g=ccc
0305 and 0300 have the same ccc, so the first one blocks the
Aaron Cannon asked:
Hi all, from the latest version of the standard, on line 16977 of the
normalization tests, I am a bit confused by the NFC form. It appears
incorrect to me. Here's the line, sans comment:
0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE
0305
On 10/23/14, Whistler, Ken ken.whist...@sap.com wrote:
Test cases like this are included in NormalizationTest.txt precisely
to ensure that implementations are correctly detecting these
sequences where composition is blocked.
And I am indeed glad that they are, as I completely missed this small
http://www.unicode.org/draft/reports/tr29/tr29.html#WB6
indicates that there should be no break between the first two letters in
the sequence
Hebrew_Letter Single_Quote Hebrew_Letter.
However, rule 7a just below indicates that there should be no break
between a Hebrew_Letter and a
On 07/24/2014 01:38 PM, Karl Williamson wrote:
http://www.unicode.org/draft/reports/tr29/tr29.html#WB6
indicates that there should be no break between the first two letters in
the sequence
Hebrew_Letter Single_Quote Hebrew_Letter.
However, rule 7a just below indicates that there should be no
Folks,
I'm trying to find an encoding of the following Akkadian cuneiform:
___ ___ ___
\ / \ / \ /
|||
| /| | /| |
| \| | \| |
|||
|\___
|/
My knowledge of cuneiforms is zero, but I can read Unicode tables :-)
However, I
On May 19, 2014, at 8:40 AM, Werner LEMBERG wrote:
If I have a cuneiform
text, where can I find glyph images to identify them?
You might want to specify what you mean by text. A photo of an inscription?
Something from a printed book?
Because of the considerable variation in glyphs over
notation) with Unicode, cf.
https://en.wikipedia.org/wiki/Hurrian_songs
A much better drawing of the tablet can be found here on page 503:
http://digital.library.stonybrook.edu/cdm/ref/collection/amar/id/7250
The character in question is the first one on the left after the
double line.
A nice
On May 19, 2014, at 9:21 AM, Werner LEMBERG wrote:
I'm interested in representing one of the so-called Hurrian songs
(tablet H.6, containing musical notation) with Unicode, cf.
https://en.wikipedia.org/wiki/Hurrian_songs
That says it represents qáb, which seems to be a version of Labat
I'm interested in representing one of the so-called Hurrian songs
(tablet H.6, containing musical notation) with Unicode, cf.
https://en.wikipedia.org/wiki/Hurrian_songs
That says it represents qáb, which seems to be a version of Labat
88, which is U+1218F KAB.
Unfortunately none of
Sorry, should have cc:d the list. Assume original mail was from a list
member.
-- Forwarded message --
From: Christopher Vance cjsva...@gmail.com
Date: 29 October 2013 16:58
Subject: Re: Terminology question re ASCII
To: Mark Davis ☕ m...@macchiato.com
Of course, once you have 8
2013-10-29 6:12, d...@bisharat.net wrote:
If one refers to plain ASCII, or plain ASCII text or ...
characters, should this be taken strictly as referring to the 7-bit
basic characters, or might it encompass characters that might appear
in an 8-bit character set (per the so-called extended
On Mon, Oct 28, 2013 at 10:38 PM, Mark Davis ☕ m...@macchiato.com wrote:
Normally the term ASCII just refers to the 7-bit form. What is sometimes
called 8-bit ASCII is the same as ISO Latin 1. If you want to be
completely clear, you can say 7-bit ASCII.
One of the first hits for 8-bit ASCII on
://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Tue, Oct 29, 2013 at 5:12 AM, d...@bisharat.net wrote:
Quick question on terminology use concerning a legacy encoding:
If one refers to plain ASCII, or plain ASCII text or ...
characters, should this be taken
they’re really trying to do because something’s
probably a bit confused.
-Shawn
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
Of Philippe Verdy
Sent: Tuesday, October 29, 2013 7:49 AM
To: Mark Davis ☕
Cc: Donald Z. Osborn; unicode
Subject: Re: Terminology question re
2013/10/29 Shawn Steele shawn.ste...@microsoft.com
I would concur. When I hear “8 bit ASCII” the context is usually
confusing the term with any of what we call “ANSI Code Pages” in Windows.
(or similar ideas on other systems).
Of course not just Windows (or MSDOS). This was seen as well in
, 2013 at 5:12 AM, d...@bisharat.net wrote:
Quick question on terminology use concerning a legacy encoding:
If one refers to plain ASCII, or plain ASCII text or ... characters,
should this be taken strictly as referring to the 7-bit basic characters,
or might it encompass characters that might
Hello,
am 2012-12-15 schrieb Philippe Verdy:
But there's still a bug (or request for enhancement) for your Pocket
converters :
- For UTF-16 you correctly exclude the range U+D800..U+DFFF (surrogates)
from the sets of convertible codepoints.
- But you don't exclude this range in the case of
2012/12/16 Otto Stolz otto.st...@uni-konstanz.de
The reason I excluded the surrogates from my UTF-8 MPE
was really that I needed additional space for the user’s
guide on the reverse side.
Why adding a row in the front side would have not preserved the space for
the reverse side ?
If this is
Hello,
2012/12/16 Otto Stolz otto.st...@uni-konstanz.de
The reason I excluded the surrogates from my UTF-8 MPE
was really that I needed additional space for the user’s
guide on the reverse side.
Sorry, typo; I meant: “my UTF-16 MPE”. I added that
extra row (with the branch excluding the
But the old Marco design at that time (2002) was still ignoring the Unicode
UTF-8 conformance constraints, as demonstrated in its use of the obsolete
U-00n notation (mathcing the obsolete ISO/IETF definition). If the
puprpose of this pocket conversion card is to be used for tutorial purpose,
Philippe Verdy wrote:
If the puprpose of this pocket conversion card is to be used for
tutorial purpose,
It never was. It was a quick reference guide for experienced users who
already understood the caveats.
Not worth arguing further.
--
Doug Ewell | Thornton, Colorado, USA
Hello,
am 2012-12-11 20:16, schrieb James Lin:
If i have a code point: U+4E8C or 二
In UTF-8, it's E4 BA 8C while in UTF-16, it's 4E8C.
Where is this BA comes from?
Cf. http://skew.org/cumped/.
Enclosed are the (almost original) version of “Cima’s Magic
UTF-8 Pocket encoder” (2004), and
On 12/11/2012 11:50 AM, vanis...@boil.afraid.org wrote:
From: James Lin James_Lin_at_symantec.com
Hi
Does anyone know why ill-form occurred on the UTF-8? besides it doesn't follow
the pattern of UTF-8 byte-sequences, i just wondering how or why?
If i have a code point: U+4E8C or 二
In UTF-8,
thank you so much everyone for explaining it. I got it now!
-James
On 12/11/12 11:50 AM, vanis...@boil.afraid.org
vanis...@boil.afraid.org wrote:
From: James Lin James_Lin_at_symantec.com
Hi
Does anyone know why ill-form occurred on the UTF-8? besides it doesn't
follow the pattern of UTF-8
; a◌֮◌̅◌̀◌̕b; a◌֮◌̅◌̀◌̕b; )
LATIN SMALL
LETTER A, COMBINING OVERLINE, COMBINING COMMA ABOVE RIGHT, COMBINING
GRAVE ACCENT, HEBREW ACCENT ZINOR, LATIN SMALL LETTER B
The relevant parts for my question are:
Source: 0061 0305 0315 0300 05AE 0062
NFD: 0061 05AE 0305 0300 0315 0062
NFC: 0061 05AE 0305
0300 *is* blocked, because there is a preceding character (0305) that has
the same combining class (230).
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Mon, Dec 10, 2012 at 11:55 AM, Edwin Hoogerbeets
ehoogerbe...@gmail.comwrote:
Looking at
and 0062, they are not blocked, but there is no composition with
00E0, so the algorithm ends with the result:
00E0 05AE 0305 0315 0062
This disagrees with what it says in the normalization tests file as listed
above. The question is, did I misunderstand the algorithm, or is this perhaps a
bug
Ah yes, I did indeed miss the equal to part. I fixed up my code and
now it works as expected.
Thanks to Mark and Ken for your help and speedy response!
Edwin
On 12/10/2012 12:57 PM, Whistler, Ken wrote:
Your misunderstanding is at the highlighted statement below. Actually
0300 **is** blocked
It seems like there is an inconsistency between what the default
grapheme clusters specification says and what the test results are
expected to be:
The UAX#29 says:
Another key feature (of default Unicode grapheme clusters) is that bdefault
Unicode grapheme clusters are atomic units with
Grandpa grandpa I wanna hear the story about the turtles *now*! :-)
Sent from my Android phone
On Fri, Feb 24, 2012 at 5:18 AM, Shriramana Sharma samj...@gmail.com wrote:
Grandpa grandpa I wanna hear the story about the turtles *now*! :-)
Sent from my Android phone
Thanks all for the enlightening reply.
My intent was sorting using UCA but it really did not matter much
because U+33D7
It is defined as 33D7;SQUARE PH;So;0;L;square 0050
0048N;SQUARED PH in UnicodeData.txt, but it is shown as pH
in code chart. Should it be 0070 0048 or PH?
Thanks,
Matt
On 2012/2/23 Matt Ma matt.ma.um...@gmail.com wrote:
It is defined as
33D7;SQUARE PH;So;0;L;square 0050 0048N;SQUARED PH
in UnicodeData.txt, but it is shown as pH in code chart. Should it be
0070 0048 or PH?
It should certainly be pH, i.e., square0070 0048/square,
because that's the
On 2/23/2012 2:44 PM, António Martins-Tuválkin wrote:
On 2012/2/23 Matt Mamatt.ma.um...@gmail.com wrote:
It is defined as
33D7;SQUARE PH;So;0;L;square 0050 0048N;SQUARED PH
in UnicodeData.txt, but it is shown as pH in code chart. Should it be
0070 0048 or PH?
It should certainly be
to 0050 0048
instead of to 0070 0048.
O.k., folks, I guess it's time for everybody to gather around the fire
for another
episode of Every Character Has a Story.
First, to answer Matt Ma's original question, no, the decomposition
should *not*
be square 0070 0048. The reason for that is simple
In addition, the default setting in Table 14, UTS #10, 6.0.0 are
strength: tertiary
alternative: shifted
But the setting won't generate the conformant behavior specified by
CollationTest_SHIFTED.txt
I think when alternative is set to shifted, strength should be set to
quaternary (as
Hi,
Does Shifted implies strength being quaternary? If strength stays as
tertiary (default or explicitly set), it seems the collation behavior
is Blanked. Please clarify.
Thanks,
Matt
Thanks for clarification. But to pass UCA conformance test on Shifted,
does the strength have to be set to quaternary? Howeve, it is stated
in UCA, C2, A conformant implementation shall support at least three
levels of collation.
Does this mean a UCA conformant implementation only need pass UCA
] on behalf of
Peter Constable [peter...@microsoft.com]
Sent: Tuesday, November 09, 2010 10:42 PM
To: James Lin; Ed
Cc: Unicode Mailing List
Subject: RE: Pupil's question about Burmese
A non-Unicode web page is like a non-Unicode app. Web pages, and apps, should
use Unicode.'
Peter
-Original
has been a standard code page for Myanmar text,
Unicode was the first time storage of Burmese text was standardised for
computers. There are several different legacy font families in use for
Myanmar each with their own slightly different mapping to Latin code
points. The font in question has
Dear Peter Constable,
*
Burmese_is_supported in windows.*
It makes worse than ever to create another story like pseudo-unicode like
Zawgyi in Windows. too.
We are in dead-lock because without releasing Myanmar Opentype specifiction
for burmese by Microsoft. We can't implement burmese in opentype
Dear Ngwe Tun,
The forthcoming ICU 4.6 will include a Burmese locale (using CLDR data), with
support for Burmese collation.
http://site.icu-project.org/
Best regards,
Peter Edberg
On Nov 9, 2010, at 2:05 AM, Ngwe Tun wrote:
...
We are in dead-lock because without releasing Myanmar
of Windows 7 is,
additionally, able to display text in scripts Tifinagh and Tai Le. In all
these cases, the system locale setting has no bearing.
Yes, displaying is fine, but the original question is copying and pasting;
without the correct locale settings, you can¹t copy/paste without corrupting
the byte
Yes, displaying is fine, but the original question is copying and pasting;
without the correct locale settings, you can’t copy/paste without corrupting
the byte sizes. Copy/paste is generally handle by OS itself, not
application. Even if you have unicode support application, you can display
: Pupil's question about Burmese
Yes, displaying is fine, but the original question is copying and
pasting; without the correct locale settings, you can’t copy/paste
without corrupting the byte sizes. Copy/paste is generally handle by
OS itself, not application. Even if you have unicode support
1 - 100 of 437 matches
Mail list logo