One of the general principles is that combining marks inherit the
property of their base character.
Normally, "inherited" should be the only property value for combining marks.
There have been some deviations from this over the years, for various
reasons, and there are some properties (such as
On 5/28/2018 6:30 AM, Hans Åberg via
Unicode wrote:
Unifying these would make a real mess of lower casing!
German has a special sign ß for "ss", without upper capital version.
You may want to retract the second part of
that
On 5/29/2018 5:57 AM, Hans Åberg via
Unicode wrote:
On 29 May 2018, at 14:47, Arthur Reutenauer wrote:
The main point is what users of ẞ and ß would think, and Unicode to adjust accordingly.
Since users of ß
On 6/6/2018 2:25 PM, Hans Åberg via
Unicode wrote:
On 4 Jun 2018, at 21:49, Manish Goregaokar via Unicode wrote:
The Rust community is considering adding non-ascii identifiers, which follow UAX #31 (XID_Start XID_Continue*, with tweaks). The propo
On 6/7/2018 9:01 AM, Alastair Houghton
via Unicode wrote:
But
please don’t misunderstand; I am not — and have not been — arguing
against non-ASCII identifiers. We were asked
whether there were any problems. These are problems
(or perhaps we might ca
On 6/8/2018 5:01 AM, Michael Everson
via Unicode wrote:
and achieving a fullscale merger with ISO/IEC 15897, after which the valid data stay hosted entirely in CLDR, and ISO/IEC 15897 would be its ISO mirror.
I wonder if Mark Davis will be qu
On 6/8/2018 2:28 PM, Marcel Schneider
via Unicode wrote:
On Fri, 8 Jun 2018 13:33:20 -0700, Asmus Freytag via Unicode wrote:
[…]
There's no value added in creating "mirrors" of something that is suc
On 6/9/2018 12:01 PM, Marcel Schneider
via Unicode wrote:
Still a computer should be understandable off-line, so CLDR providing a standard library of error messages could be
appreciated by the industry.
The kind of translations that CLDR accumulates,
On 6/12/2018 7:58 AM, Michael Everson
via Unicode wrote:
Marcel,
You have put words into my mouth. Please don’t. Your description of what I said is NOT accurate.
On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode wrote:
And in this thread I
On 6/20/2018 2:17 PM, Doug Ewell via
Unicode wrote:
Ivan Panchenko wrote:
Is there a reason why the mu does not appear upright
It was probably italicized in the glyphs printed in the relevant
Japanese standard, back in the 1990s.
The
while fascinating, I agree with John,
the topic is best treated as out of scope for plain text.
Let's take this discussion off this list.
A./
On 7/10/2018 1:43 PM, William_J_G Overington via Unicode wrote:
Thank you for
I would say the problem lies in the attempt to exchange arbitrary
raw data and expect perfectly compatible rendering.
In the absence of very explicit markup there's simply no expectation
that all users see precisely the same thing. Editors for plain text
will wrap or
The use case would seem to be more
properly served by some form of registration mechanism, like the
one IVD represents for ideographs.
The use of "standardized" variation sequences with the
understanding that those would be (fairly) widely implemented
On 7/16/2018 8:30 PM, Richard
Wordingham via Unicode wrote:
On Mon, 16 Jul 2018 10:53:03 +0300
Shai Berger via Unicode wrote:
What I'm not OK with is:
!Hello, World
Which is what you'll see if your editor decides to use RTL
directionality for thi
On 7/16/2018 10:04 PM, Janusz S. Bień
via Unicode wrote:
I understand there is no sufficient demand for the Unicode Consortium
maintaining a supplementary non-ideographic variation database. Hence
for the time being a kind of Private Use variation database seems to
On 7/18/2018 1:51 AM, Shai Berger via
Unicode wrote:
My claim is that in the absence of an agreed
or conveyed higher-level protocol, this default must be respected.
Not how higher-level protocols work in
Unicode.
If you say that you support the
On 7/18/2018 1:51 AM, Shai Berger via
Unicode wrote:
The trade-off you seem to prefer is to make the "plain text
is universally readable" idea from the core Unicode definition, not
applicable to BiDi text.
Your idea would simply outlaw being able to
On 7/18/2018 6:43 AM, philip chastney
via Unicode wrote:
except that I remember a conference where one of the paricipants noted that
fully one-third of the time allocated to each presentation was taken up
explaining the presenter's notation
:)
On 7/26/2018 9:27 AM, Markus Scherer
via Unicode wrote:
I would not expect for Ä+combining () above = Ä᪻ to
look right except with specialized fonts.
http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%84%5Cu1ABB&s=&uv=0
On 7/27/2018 3:42 AM, Michael Everson
via Unicode wrote:
Yes and it explains clearly that “effectively caseless Georgian” is incorrect. Georgian has case. Georgian uses case differently from other scripts. This is an orthographic distinction, not a structural one. I
On 8/20/2018 7:09 AM, James Kass via
Unicode wrote:
Leo Broukhis responded to William Overington:
I decided that trying to design emoji for 'I' and for 'You' seemed
interesting so I decided to have a go at designing some.
On 8/21/2018 1:01 AM, Julian Bradfield
via Unicode wrote:
On 2018-08-20, Mark E. Shoulson via Unicode wrote:
Moreover, they [William's pronoun symbols] are once again an attempt to shoehorn Overington's pet
project, "language-independent sentences/
On 8/27/2018 2:20 PM, Rebecca
Bettencourt via Unicode wrote:
> That
sounds like a non-conformant use of characters
On 9/11/2018 5:02 PM, Andrew Glass via
Unicode wrote:
On Windows, Khmer is rendered with a dedicated shaping engine. I don't see a need to alter that engine or integrate Khmer with USE. How we fix Tai Tham, which does go to USE is a different matter. We need to wor
On 10/27/2018 4:10 AM, Janusz S. Bień
via Unicode wrote:
Hi!
On the over 100 years old postcard
https://photos.app.goo.gl/GbwNwYbEQMjZaFgE6
you can see 2 occurences of a symbol which is explicitely explained (in
Polish) as meaning "Magister".
First question is:
On 10/28/2018 11:50 PM, Martin J. Dürst
via Unicode wrote:
On 2018/10/29 05:42, Michael Everson via Unicode wrote:
This is no different the Irish name McCoy which can be written MᶜCoy where the raising of the c is actually just decorative, though per
On 10/31/2018 2:38 AM, Julian Bradfield
via Unicode wrote:
You could use the various hacks
you've discussed, with modifier letters; but that is not "encoding",
that is "abusing Unicode to do markup". At least, that's the view I
take!
+1
In general, I ha
On 10/31/2018 10:32 AM, Janusz S. Bień
via Unicode wrote:
Let me remind what plain text is according to the Unicode glossary:
Computer-encoded text that consists only of a sequence of code
points from a given standard, with no other formatting or structural
On 10/31/2018 11:10 AM, Marcel
Schneider via Unicode wrote:
which, if my understanding of "convient" is correct, carefully does
[not] quite say that it is *wrong* not to superscript, but that one should
superscript when one can because that is the conventio
On 10/31/2018 10:18 AM, Marcel
Schneider via Unicode wrote:
On 31/10/2018 at 17:03, Khaled Hosny wrote:
A while I was localizing some application to Arabic and the developer
“helpfully” used m² for square meter, but that does not work for Arabic
bec
On 10/31/2018 9:03 AM, Khaled Hosny via
Unicode wrote:
A while I was localizing some application to Arabic and the developer
“helpfully” used m² for square meter, but that does not work for Arabic
because there is no superscript ٢ in Unicode, so I had to contact the
On 10/31/2018 3:37 PM, Marcel Schneider
via Unicode wrote:
On 31/10/2018 19:42, Asmus Freytag via Unicode wrote:
On 10/31/2018 11:10 AM, Marcel Schneider via Unicode wrote:
which, if my understanding of
Organic chemistry would need sub/sup
alpha, beta and gamma (perhaps others).
A./
On 10/31/2018 3:35 PM, Piotr Karocki
via Unicode wrote:
We don't know whether the abbreviation "Mr", spelled exactly this way,
already existe
On 11/1/2018 12:52 AM, Richard
Wordingham via Unicode wrote:
On Wed, 31 Oct 2018 11:35:19 -0700
Asmus Freytag via Unicode wrote:
On the other hand, I'm a firm believer in applying certain styling
attributes to things like e-mail or discu
On 11/1/2018 12:33 AM, Janusz S. Bień
via Unicode wrote:
On Wed, Oct 31 2018 at 12:14 -0700, Ken Whistler via Unicode wrote:
On 10/31/2018 11:27 AM, Asmus Freytag via Unicode wrote:
but we don't have an agreement
On 11/1/2018 10:23 AM, Janusz S. Bień
via Unicode wrote:
On Thu, Nov 01 2018 at 8:43 -0700, Asmus Freytag via Unicode wrote:
On 11/1/2018 12:33 AM, Janusz S. Bień via Unicode wrote:
On Wed, Oct 31 2018 at 12:14 -0700, Ken Whistler via Unicode
On 11/1/2018 7:59 PM, James Kass via
Unicode wrote:
Alphabetic script users write things the way they are spelled and
spell things the way they are written. The abbreviation in
question as written consists of three recognizable symbols. An
On 11/2/2018 4:31 AM, James Kass via
Unicode wrote:
Suppose someone found a hundred year old form from Poland which
included a section for "sign your name" and "print your name"
which had been filled out by a man with the typically Polish name
On 11/10/2018 10:03 PM, Beth Myre via
Unicode wrote:
Hi Mark,
I (re-)transliterated it, and it reads:
Wir sind uns dessen bewusst, dass von
Seite der
Gege
On 11/11/2018 12:32 PM, Hans Åberg via
Unicode wrote:
On 11 Nov 2018, at 07:03, Beth Myre via Unicode wrote:
Hi Mark,
This is a really cool find, and it's interesting that you might have a relative mentioned in it. After looking at it more, I'm
On 11/11/2018 4:20 PM, Mark E. Shoulson
via Unicode wrote:
On
11/11/18 4:16 PM, Asmus Freytag via Unicode wrote:
On 11/11/2018 12:32 PM, Hans Åberg via
Unicode wrote:
Wir sind uns dessen bewusst, dass von
Precisely. Not in the context of character coding so much as just
in terms of learning about writing systems. For example, is it
something that was absolutely common with "standardiyed"
conventions, or more of an ad-hoc thing?
A./
On 11/22/2018 11:58 AM, Carl via
Unicode wrote:
(It looks like my HTML email got scrubbed, sorry for the double post)
Hi,
In Chapter 3 Section 13, the Unicode spec defines D146:
"A string X is a compatibility caseless match for a string Y if and only if: NFKD(t
On 1/7/2019 7:46 PM, James Kass via
Unicode wrote:
Making
recommendations for the post processing of strings containing the
combining low line strikes me as being outside the scope of
Unicode, though.
Agreed.
Those kinds of things are effe
On 1/7/2019 10:40 PM, Marcel Schneider
via Unicode wrote:
The
pitch is that if some languages are still considered “needing”
rich text where others are correctly represented in plain text
(stress, abbreviations), the Standard needs to be updated in a way
On 1/8/2019 1:11 PM, James Kass via
Unicode wrote:
Asmus Freytag wrote,
> ...
> (for an extreme example there's an orthography
> out there that uses @ as a letter -- we know that
>
On 1/8/2019 10:58 PM, James Kass via
Unicode wrote:
If a text is published in all italics, that’s style/font choice.
If a text is published using italics and roman contrastively and
consistently, and everybody else is doing it pretty much the same
On 1/9/2019 1:06 AM, James Kass via
Unicode wrote:
Asmus Freytag wrote,
> Still, not supported in plain text (unless you abuse the
> math alphabets for things they were not intended for).
The unin
contrastively and consistently, and everybody else is doing it
pretty much the same way, that’s a convention.
Asmus Freytag responded:
But
not all conventions are deemed worth of plaintext encoding.
What
are the criteria for “worth
On 1/9/2019 4:41 PM, Mark E. Shoulson
via Unicode wrote:
On 1/9/19 2:30 AM, Asmus Freytag via
Unicode wrote:
English use of italics on isolated words
to disambiguate the reading of some sentences is a
On 1/12/2019 5:22 AM, Richard
Wordingham via Unicode wrote:
On Sat, 12 Jan 2019 10:57:26 + (GMT)
Julian Bradfield via Unicode wrote:
It's also fundamentally misguided. When I _italicize_ a word, I am
writing a word composed of (plain old) lette
On 1/14/2019 2:08 AM, Tex via Unicode
wrote:
Perhaps the question should be put to
twitter, messaging apps, text-to-voice vendors, and others
whether it will be useful or not.
If the discussion continues I would like
to see more of a co
On 1/14/2019 2:58 PM, David Starner via
Unicode wrote:
Source code is an example of plain text, and yet adding italics into
comments would require but a trivial change to editors. If the user
audience cared, it would have been done. In fact, I suspect there
exist ed
On 1/14/2019 3:37 PM, Richard
Wordingham via Unicode wrote:
On Tue, 15 Jan 2019 00:02:49 +0100
Hans Åberg via Unicode wrote:
On 14 Jan 2019, at 23:43, James Kass via Unicode
wrote:
Hans Åberg wrote,
How about
On 1/14/2019 2:43 PM, James Kass via
Unicode wrote:
Hans Åberg wrote,
> How about using U+0301 COMBINING ACUTE ACCENT: 𝑝𝑎𝑠𝑠𝑒́
Thought about using a combining accent. Figured it would just
display with a dotted ci
From:
Unicode [mailto:unicode-boun...@unicode.org] On
Behalf Of Asmus Freytag via Unicode
Sent: Monday, January 14, 2019 1:21 PM
To: unicode@unicode.org
Subjec
On 1/14/2019 5:41 PM, Mark E. Shoulson
via Unicode wrote:
On 1/14/19 5:08 AM, Tex via Unicode
wrote:
This thread has gone on for a bit and
I question if there is any more light th
On 1/16/2019 6:33 AM, Marcel Schneider
via Unicode wrote:
So to
date, Unicode has only made half its way, and for every single
script in the
Standard there is another script out there that remains still
unsupported.
First things first.
On 1/16/2019 7:38 PM, James Kass via
Unicode wrote:
Computer
text tradition aside, nobody seems to offer any legitimate reason
why such information isn't worthy of being preservable in
plain-text. Perhaps there isn't one.
By introducing s
On 1/17/2019 9:35 AM, Marcel Schneider
via Unicode wrote:
[quoted mail]
But the French "espace fine insécable" was requested
long long before Mongolian was discussed for encodinc in
On 1/18/2019 7:27 AM, Marcel Schneider
via Unicode wrote:
I understand only better
why a significant majority of UTC is hating French.
Francophobia is also palpable in Canada, beyond any
technical reasons, especially in the IT indus
On 1/18/2019 7:27 AM, Marcel Schneider
via Unicode wrote:
Covering existing
character sets (National, International and Industry)
was an (not "the") important goal at
the time: such cov
Marcel,
about your many detailed *technical* questions about the history
of character properties, I am afraid I have no specific
recollection.
French is not the only language that uses a space to group
figures. In fact, I grew up with thousands separators being
I would full agree and I think Mark puts it really well in the
message below why some of the proposals brandished here are no
longer plain text but "not-so-plain" text.
I think we are better served with a solution that provides some
form of "light" rich text, for ba
On 1/18/2019 2:05 PM, Marcel Schneider
via Unicode wrote:
On 18/01/2019 20:09, Asmus Freytag
via Unicode wrote:
Marcel,
about your many detailed *technical* questions about the
history of character
On 1/18/2019 2:46 PM, Shawn Steele via
Unicode wrote:
>> That
should not impact all other users out there interested in a
civilized layout.
I’m not sure
that the choice of the word “civilized” adds value to the
conversation. We
On 1/18/2019 11:34 PM, Marcel Schneider
via Unicode wrote:
Current
practice in electronic publishing was to use a non-breakable
thin space, Philippe Verdy reports. Did that information come
in somehow?
==> prob
On 1/19/2019 12:34 PM, James Kass via
Unicode wrote:
On 2019-01-19 6:19 PM, wjgo_10...@btinternet.com wrote:
> It seems to me that it would be useful to have some codes
that are
> ordinary characters in some contexts yet
On 1/19/2019 3:53 AM, James Kass via
Unicode wrote:
Marcel Schneider wrote,
> When you ask for knowing the foundations and that knowledge
is persistently refused,
> you end up believing that those foundations just can’t
On 1/20/2019 2:49 PM, Garth Wallace via
Unicode wrote:
I think the real solution is for Twitter to just
implement basic styling and make this a moot point.
Twitter FB and CO should implement a common "MarkDown" sch
On 1/20/2019 2:55 PM, James Kass via
Unicode wrote:
On 2019-01-20 10:49 PM, Garth Wallace wrote:
I think the real solution is for Twitter
to just implement basic styling and make this a moot point.
At which ti
On 1/24/2019 9:44 PM, Garth Wallace via
Unicode wrote:
But the root problem isn't the kludge, it's the lack of
functionality in these systems: if Twitter etc. simply
implemented some styling on their own, the whole thing would be
a moot point
On 1/25/2019 9:39 AM, James Tauber via
Unicode wrote:
Thank you, although the word break does still
affect things like double-clicking to select.
And people do seem to want to use U+02BC for this reason
(and I'm try
On 1/25/2019 10:05 AM, James Kass via
Unicode wrote:
For U+2019, there's a note saying 'this is the preferred character
to use for apostrophe'.
Mark Davis wrote,
> When it is between letters it doesn't cause a wor
On Fri, Jan 25, 2019 at 11:07 PM Asmus Freytag
via Unicode <unicode@unicode
On 1/26/2019 5:43 PM, Richard
Wordingham via Unicode wrote:
On Sat, 26 Jan 2019 17:11:49 -0800
Asmus Freytag via Unicode wrote:
To make matters worse, users for languages that "should" use U+02BC
aren't actually consistent; much d
On 1/26/2019 6:25 PM, Michael Everson
via Unicode wrote:
On 27 Jan 2019, at 01:37, Richard Wordingham via Unicode wrote:
I’ll be publishing a translation of Alice into Ancient Greek in due
course. I will absolutely only use U+20
On 1/26/2019 7:53 PM, Richard
Wordingham via Unicode wrote:
On Sun, 27 Jan 2019 01:55:29 +
James Kass via Unicode wrote:
Richard Wordingham replied to Asmus Freytag,
>> To make matters worse, users for languages that "should"
On 1/26/2019 10:08 PM, Richard
Wordingham via Unicode wrote:
On Sat, 26 Jan 2019 21:11:36 -0800
Asmus Freytag via Unicode wrote:
On 1/26/2019 5:43 PM, Richard Wordingham via Unicode wrote:
That appears to
Arabic terminals and terminal emulators
existed at the time of Unicode 1.0. If you are trying to emulate
those services, for example so that older software can run, you
would need to look at how these programs expected to be fed their
data.
I see lit
On 1/30/2019 4:38 PM, Kent Karlsson via
Unicode wrote:
I did say "multiple" and "for instance". But since you ask:
ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
is implemented in
On 1/30/2019 7:46 PM, David Starner via
Unicode wrote:
On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
wrote:
A new beta of BabelPad has been released which enables input, storing,
and display of italics, bold, strikethrough, and underline i
On 1/31/2019 12:55 AM, Tex via Unicode
wrote:
As with the many problems with walls not being effective, you choose to ignore the legitimate issues pointed out on the list with the lack of italic standardization for Chinese braille, text to voice readers, etc.
The ch
On 2/4/2019 11:21 AM, Costello, Roger
L. via Unicode wrote:
Hello Unicode Experts!
As I understand it, endian-ness applies to multi-byte words.
Endian-ness does not apply to ASCII characters because each character is a single byte.
Endian-ness does apply to UTF-1
On 2/4/2019 1:00 PM, Richard Wordingham
via Unicode wrote:
To me, 'visual order' means in the dominant order of the script.
Visual order is a term of art, meaning the characters are ordered
in memory in the same order as they are displayed on the scr
On 2/8/2019 2:08 PM, Richard Wordingham
via Unicode wrote:
On Fri, 8 Feb 2019 17:16:09 + (GMT)
"wjgo_10...@btinternet.com via Unicode" wrote:
Andrew West wrote:
Just reminding you that "The initial char
On 2/8/2019 5:42 PM, James Kass via
Unicode wrote:
William,
Rather than having the user insert the VS14 after every character,
the editor might allow the user to select a span of text for
italicization. Then it would be up to
On quick reading this appears to be a
strong argument why such emulators will
never be able to be used for certain
scripts. Effectively, the model described works
well with any scripts where characters
are laid out (or can be laid out) in fixed
width cells t
On 2/9/2019 12:07 PM, Egmont Koblinger
via Unicode wrote:
On Sat, Feb 9, 2019 at 9:01 PM Eli Zaretskii wrote:
then what you say is that some scripts
can never be supported by text terminals.
I'm not familiar at all with all the scrip
On 2/13/2019 5:19 PM, Mark E. Shoulson
via Unicode wrote:
And
again, all this is before we even consider other issues; I can't
shake the feeling that there security nightmares lurking inside
this idea.
Default ignorables are bad juju.
A./
On 2/22/2019 7:29 AM, Richard
Wordingham via Unicode wrote:
On Fri, 22 Feb 2019 09:07:06 +
Richard Wordingham via Unicode wrote:
My best hypothesis (not thoroughly tested) is that Windows currently
has InSc=Consonant_Killer, but can I look his
I
suspect that this work would be jibber-jabber to any non-English
speaker unfamiliar with the original Haggadah. No matter how
otherwise fluent they might be in emoji communication.
You can't escape fundamental theses:
There
On 4/19/2019 6:57 PM, Shriramana Sharma
via Unicode wrote:
I don't know many modern fonts that display 007C
as a broken glyph. In fact I haven't seen a broken line pipe
glyph since the MS-DOS days. Nowadays we have 00A6 for that.
On 5/1/2019 3:23 AM, Shriramana Sharma
via Unicode wrote:
http://www.unicode.org/L2/L-curdoc.htm
The number of emoji-related proposals seems to be increasing compared
to the number of script-related ones.
Have we reached a plateau re scripts encoding?
Somehow thi
On 5/2/2019 8:44 AM, J Andrew Lipscomb
via Unicode wrote:
Why not just use U+25E4 and U+25E2 for the triangles, and U+2215 for the diagonal?
Why not wait for evidence of that scheme
being used in text. Then we know.
A./
On 5/15/2019 4:22 AM, Costello, Roger
L. via Unicode wrote:
Hello Unicode experts!
Which is correct:
(a) The input file contains a string. The string is encoded using UTF-8.
(b) The input file contains a string. The string is encoded with UTF-8.
(c) The input fi
On 5/30/2019 1:07 AM, Andre Schappo via
Unicode wrote:
This tweet made me laugh twitter.com/padolsey/status/1133835770773626881 😀🤯
André Schappo
On 5/31/2019 7:12 AM, Michael Everson
via Unicode wrote:
No, thank you.
Not so fast. I think we need to hear from the telemdicine
community first.
A./
On 31 May 2019, at 11:18, bristol_poo via Unicode wrote:
Gre
A question has come up in another
context:
Is there any linguistic term for
describing the process of removing accents and diacritics from a
word to create its “base form”, e.g. São Tomé to Sao Tome?
The linguistic term "string normalization" appear
On 7/17/2019 6:03 PM, Richard
Wordingham via Unicode wrote:
On Thu, 18 Jul 2019 01:54:52 +0200
Philippe Verdy via Unicode wrote:
In fact the ligatures system for the "cursive" Egyptian Hieratic is so
complex (and may also have its own variants show
1201 - 1300 of 1363 matches
Mail list logo