> On 21 Feb 2020, at 13:21, Costello, Roger L. via Unicode
> wrote:
>
> There are binary files and there are text files.
In C, when opening a file as binary with the function fopen, the newlines are
untranslated [1]. If not using this option, the file is informally text, which
means that in
> On 13 Feb 2020, at 16:41, wjgo_10...@btinternet.com via Unicode
> wrote:
>
> Yet a Private Use Area encoding at a particular code point is not unique.
> Thus, except with care amongst people who are aware of the particular
> encoding, there is no interoperability, such as with regular Unico
> On 13 Feb 2020, at 00:26, Shawn Steele wrote:
>
>> From the point of view of Unicode, it is simpler: If the character is in use
>> or have had use, it should be included somehow.
>
> That bar, to me, seems too low. Many things are only used briefly or in a
> private context that doesn't r
> On 12 Feb 2020, at 23:30, Michel Suignard via Unicode
> wrote:
>
> These abstract collections have started to appear in the first part of the
> nineteen century (Champollion starting in 1822). Interestingly these
> collections have started to be useful on their own even if in some case the
For rendering, you might have a look at ConTeXt, because I recall it has an
option whereby Unicode super- and sub-scripts can be displayed over each other
without extra processing.
> On 14 Jan 2020, at 06:44, via Unicode wrote:
>
> Thanks for your reply. I think actually LaTeX is not a good o
> On 14 Oct 2019, at 02:10, Richard Wordingham via Unicode
> wrote:
>
> On Mon, 14 Oct 2019 00:22:36 +0200
> Hans Åberg via Unicode wrote:
>
>>> On 13 Oct 2019, at 23:54, Richard Wordingham via Unicode
>>> wrote:
>
>>> Besides invalidat
> On 13 Oct 2019, at 23:54, Richard Wordingham via Unicode
> wrote:
>
> The point about these examples is that the estimate of one state per
> character becomes a severe underestimate. For example, after
> processing 20 a's, the NFA for /[ab]{0,20}[ac]{10,20}[ad]{0,20}e/ can
> be in any of ab
> On 13 Oct 2019, at 21:17, Richard Wordingham via Unicode
> wrote:
>
> On Sun, 13 Oct 2019 15:29:04 +0200
> Hans Åberg via Unicode wrote:
>
>>> On 13 Oct 2019, at 15:00, Richard Wordingham via Unicode
>>> I'm now beginning to wonder what you are cla
> On 13 Oct 2019, at 15:00, Richard Wordingham via Unicode
> wrote:
>
>>> On Sat, 12 Oct 2019 21:36:45 +0200
>>> Hans Åberg via Unicode wrote:
>>>
>>>>> On 12 Oct 2019, at 14:17, Richard Wordingham via Unicode
>>>>> wrote:
&
> On 13 Oct 2019, at 00:37, Richard Wordingham via Unicode
> wrote:
>
> On Sat, 12 Oct 2019 21:36:45 +0200
> Hans Åberg via Unicode wrote:
>
>>> On 12 Oct 2019, at 14:17, Richard Wordingham via Unicode
>>> wrote:
>>>
>>> But remember t
> On 12 Oct 2019, at 14:17, Richard Wordingham via Unicode
> wrote:
>
> But remember that 'having longer first' is meaningless for a
> non-deterministic finite automaton that does a single pass through the
> string to be searched.
It is possible to identify all submatches deterministically in
> On 30 Apr 2019, at 04:32, Mark E. Shoulson via Unicode
> wrote:
>
> On 4/29/19 3:34 PM, Doug Ewell via Unicode wrote:
>> Hans Åberg wrote:
>>
>>> The guy who made the artwork for Heroes is completely color-blind,
>>> seeing only in a grayscale, so they agreed he coded the colors in
>>> bla
> On 29 Apr 2019, at 21:34, Doug Ewell wrote:
>
> Hans Åberg wrote:
>
>> The guy who made the artwork for Heroes is completely color-blind,
>> seeing only in a grayscale, so they agreed he coded the colors in
>> black and white, and then that was replaced with colors.
>
> Did he use this part
> On 29 Apr 2019, at 20:02, Doug Ewell via Unicode wrote:
>
> Philippe Verdy wrote:
>
>> A very useful think to add to Unicode (for colorblind people) !
>>
>> http://bestinportugal.com/color-add-project-brings-color-identification-to-the-color-blind
>>
>> Is it proposed to add as new symbols
You are possibly both right, because it is OK in the web font but wrong in the
desktop font.
> On 17 Apr 2019, at 23:53, Oren Watson via Unicode wrote:
>
> You can easily reproduce this by going here:
> https://www.fonts.com/font/microsoft-corporation/calibri/regular
> and putting in the foll
> On 15 Jan 2019, at 02:18, Richard Wordingham via Unicode
> wrote:
>
> On Mon, 14 Jan 2019 16:02:05 -0800
> Asmus Freytag via Unicode wrote:
>
>> On 1/14/2019 3:37 PM, Richard Wordingham via Unicode wrote:
>> On Tue, 15 Jan 2019 00:02:49 +0100
>> Hans Åb
> On 14 Jan 2019, at 23:43, James Kass via Unicode wrote:
>
> Hans Åberg wrote,
>
> > How about using U+0301 COMBINING ACUTE ACCENT: 𝑝𝑎𝑠𝑠𝑒́
>
> Thought about using a combining accent. Figured it would just display with a
> dotted circle but neglected to try it out first. It actually rende
> On 13 Jan 2019, at 22:43, Khaled Hosny via Unicode
> wrote:
>
> LaTeX with the
> “unicode-math” package will translate ASCII + font switches to the
> respective Unicode math alphanumeric characters. Word will do the same.
> Even browsers rendering MathML will do the same (though most likely
> On 14 Jan 2019, at 06:08, James Kass via Unicode wrote:
>
> 𝐴𝑟𝑡 𝑛𝑜𝑢𝑣𝑒𝑎𝑢 seems a bit 𝑝𝑎𝑠𝑠é nowadays, as well.
>
> (Had to use mark-up for that “span” of a single letter in order to indicate
> the proper letter form. But the plain-text display looks crazy with that
> HTML jive in it.)
How
> On 2 Dec 2018, at 20:29, Janusz S. Bień via Unicode
> wrote:
>
> On Sun, Dec 02 2018 at 10:33 +0100, Hans Åberg via Unicode wrote:
>>
>> It was common in the 1800s to singly and doubly underline superscript
>> abbreviations in handwriting according to [1-2]
> On 30 Oct 2018, at 22:50, Ken Whistler via Unicode
> wrote:
>
> On 10/30/2018 2:32 PM, James Kass via Unicode wrote:
>> but we can't seem to agree on how to encode its abbreviation.
>
> For what it's worth, "mgr" seems to be the usual abbreviation in Polish for
> it.
It was common in the
> On 12 Nov 2018, at 00:00, Asmus Freytag (c) wrote:
>
> On 11/11/2018 1:37 PM, Hans Åberg wrote:
>>> On 11 Nov 2018, at 22:16, Asmus Freytag via Unicode
>>> wrote:
>>>
>>> On 11/11/2018 12:32 PM, Hans Åberg via Unicode wrote:
>>>
> On 11 Nov 2018, at 22:16, Asmus Freytag via Unicode
> wrote:
>
> On 11/11/2018 12:32 PM, Hans Åberg via Unicode wrote:
>>
>>> On 11 Nov 2018, at 07:03, Beth Myre via Unicode
>>> wrote:
>>>
>>> Hi Mark,
>>>
>>>
> On 11 Nov 2018, at 07:03, Beth Myre via Unicode wrote:
>
> Hi Mark,
>
> This is a really cool find, and it's interesting that you might have a
> relative mentioned in it. After looking at it more, I'm more convinced that
> it's German written in Hebrew letters, not Yiddish. I think that
> On 30 Oct 2018, at 22:50, Ken Whistler via Unicode
> wrote:
>
> On 10/30/2018 2:32 PM, James Kass via Unicode wrote:
>> but we can't seem to agree on how to encode its abbreviation.
>
> For what it's worth, "mgr" seems to be the usual abbreviation in Polish for
> it.
That seems to be the
> On 12 Sep 2018, at 04:34, Eli Zaretskii via Unicode
> wrote:
>
>> Date: Wed, 12 Sep 2018 00:13:52 +0200
>> Cc: unicode@unicode.org
>> From: Hans Åberg via Unicode
>>
>> It might be useful to represent non-UTF-8 bytes as Unicode code points. One
> On 11 Sep 2018, at 23:48, Richard Wordingham via Unicode
> wrote:
>
> On Tue, 11 Sep 2018 21:10:03 +0200
> Hans Åberg via Unicode wrote:
>
>> Indeed, before UTF-8, in the 1990s, I recall some Russians using
>> LaTeX files with sections in differen
> On 11 Sep 2018, at 20:40, Eli Zaretskii wrote:
>
>> From: Hans Åberg
>> Date: Tue, 11 Sep 2018 20:14:30 +0200
>> Cc: hsivo...@hsivonen.fi,
>> unicode@unicode.org
>>
>> If one encounters a file with mixed encodings, it is good to be able to view
>> its contents and then convert it, as I see
> On 11 Sep 2018, at 19:21, Eli Zaretskii wrote:
>
>> From: Hans Åberg
>> Date: Tue, 11 Sep 2018 19:13:28 +0200
>> Cc: Henri Sivonen ,
>> unicode@unicode.org
>>
>>> In Emacs, each raw byte belonging
>>> to a byte sequence which is invalid under UTF-8 is represented as a
>>> special multibyte
> On 11 Sep 2018, at 13:13, Eli Zaretskii via Unicode
> wrote:
>
> In Emacs, each raw byte belonging
> to a byte sequence which is invalid under UTF-8 is represented as a
> special multibyte sequence. IOW, Emacs's internal representation
> extends UTF-8 with multibyte sequences it uses to rep
> On 9 Sep 2018, at 21:20, Eli Zaretskii via Unicode
> wrote:
>
> In Emacs, the gap is always where the text is inserted or deleted, be
> it in the middle of text or at its end.
>
>> All editors I have seen treat the text as ordered collections of small
>> buffers (these small buffers may st
> On 8 Jun 2018, at 11:07, Henri Sivonen via Unicode
> wrote:
>
> My question is:
>
> When designing a syntax where tokens with the user-chosen characters
> can't occur next to each other without some syntax-reserved characters
> between them, what advantages are there from limiting the user-
Now that the distinction is possible, it is recommended to do that.
My original question was directed to the OP, whether it is deliberate.
And they are confusables only to those not accustomed to it.
> On 7 Jun 2018, at 12:05, Philippe Verdy wrote:
>
> In my opinion the usual constant is most
> On 7 Jun 2018, at 03:56, Asmus Freytag via Unicode
> wrote:
>
> On 6/6/2018 2:25 PM, Hans Åberg via Unicode wrote:
>>> On 4 Jun 2018, at 21:49, Manish Goregaokar via Unicode
>>> wrote:
>>>
>>> The Rust community is considering adding
> On 4 Jun 2018, at 21:49, Manish Goregaokar via Unicode
> wrote:
>
> The Rust community is considering adding non-ascii identifiers, which follow
> UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for
> identifiers to be treated as equivalent under NFKC.
So, in this l
> On 29 May 2018, at 14:47, Arthur Reutenauer wrote:
>
>> The main point is what users of ẞ and ß would think, and Unicode to adjust
>> accordingly.
>
> Since users of ß would think that in the vast majority of cases, it
> ought to be uppercased to SS, I think you’re missing the main point.
> On 29 May 2018, at 12:55, Arthur Reutenauer wrote:
>
>> If uppercasing is not common, one would think that setting it too ẞ would
>> pose no problems, no that it is available.
>
> It would, for reasons of stability.
The main point is what users of ẞ and ß would think, and Unicode to adjus
> On 29 May 2018, at 11:17, Werner LEMBERG wrote:
>
>> When looking for the lowercase ß LATIN SMALL LETTER SHARP S U+00DF
>> in a MacOS Character Viewer, it does not give the uppercase version,
>> for some reason.
>
> Yes, and it will stay so, AFAIK. The uppercase variant of `ß' is
> `SS'. `
> On 29 May 2018, at 10:54, Martin J. Dürst wrote:
>
> On 2018/05/29 17:15, Hans Åberg via Unicode wrote:
>>> On 29 May 2018, at 07:30, Asmus Freytag via Unicode
>>> wrote:
>
>>> An uppercase exists and it has formally been ruled as acceptable way to
> On 29 May 2018, at 07:30, Asmus Freytag via Unicode
> wrote:
>
> On 5/28/2018 6:30 AM, Hans Åberg via Unicode wrote:
>>> Unifying these would make a real mess of lower casing!
>>>
>> German has a special sign ß for "ss", without upper capital
> On 28 May 2018, at 21:38, Richard Wordingham
> wrote:
>
> On Mon, 28 May 2018 21:14:58 +0200
> Hans Åberg via Unicode wrote:
>
>>> On 28 May 2018, at 21:01, Richard Wordingham via Unicode
>>> wrote:
>>>
>>> On Mon, 28 May
> On 28 May 2018, at 21:01, Richard Wordingham via Unicode
> wrote:
>
> On Mon, 28 May 2018 20:19:09 +0200
> Hans Åberg via Unicode wrote:
>
>> Indistinguishable math styles Latin and Greek uppercase letters have
>> been added, even though that was not so in f
> On 28 May 2018, at 19:18, Richard Wordingham via Unicode
> wrote:
>
> On Mon, 28 May 2018 17:54:47 +0200
> Hans Åberg via Unicode wrote:
>
>>> On 28 May 2018, at 17:00, Richard Wordingham via Unicode
>>> wrote:
>>>
>>> On Mon, 28 May
> On 28 May 2018, at 17:00, Richard Wordingham via Unicode
> wrote:
>
> On Mon, 28 May 2018 15:30:55 +0200
> Hans Åberg via Unicode wrote:
>
>>> On 28 May 2018, at 15:10, Richard Wordingham via Unicode
>>> wrote:
>>>
>>> On Mon, 28 May
> On 28 May 2018, at 15:10, Richard Wordingham via Unicode
> wrote:
>
> On Mon, 28 May 2018 10:08:30 +0200
> Hans Åberg via Unicode wrote:
>
>> It is not about precision, but concepts. Like B, Β, and В, which
>> could have been unified, but are not.
>
> U
> On 28 May 2018, at 11:05, Julian Bradfield via Unicode
> wrote:
>
> On 2018-05-28, Hans Åberg via Unicode wrote:
>>> On 28 May 2018, at 03:39, Garth Wallace wrote:
>>>> On Sun, May 27, 2018 at 3:36 PM, Hans Åberg wrote:
>>>> The flats and sha
> On 28 May 2018, at 03:39, Garth Wallace wrote:
>
>> On Sun, May 27, 2018 at 3:36 PM, Hans Åberg wrote:
>> The flats and sharps of Arabic music are semantically the same as in Western
>> music, departing from Pythagorean tuning, then, but the microtonal
>> accidentals are different: they sim
rks taken isolately mean absolutely nothing in both systems outside
> the keyed scores in which they are inserted, except that they are just
> glyphs, which may be used to mean something else (e.g. a note in a comics
> artwork could be used to denote someone whistling, without actually encod
> On 17 May 2018, at 16:47, Garth Wallace via Unicode
> wrote:
>
> On Thu, May 17, 2018 at 12:41 AM Hans Åberg wrote:
>
> > On 17 May 2018, at 08:47, Garth Wallace via Unicode
> > wrote:
> >
> >> On Wed, May 16, 2018 at 12:42 AM, Hans Åberg via
> On 17 May 2018, at 08:47, Garth Wallace via Unicode
> wrote:
>
>> On Wed, May 16, 2018 at 12:42 AM, Hans Åberg via Unicode
>> wrote:
>>
>> It would be best to encode the SMuFL symbols, which is rather comprehensive
>> and include those:
>>
> On 16 May 2018, at 09:42, Hans Åberg via Unicode wrote:
>
>> On 16 May 2018, at 00:48, Ken Whistler via Unicode
>> wrote:
>>
>>> A proposal should also show evidence of usage and glyph variations.
>>
>> And should probably refer to the relation
> On 16 May 2018, at 00:48, Ken Whistler via Unicode
> wrote:
>
> On 5/15/2018 2:46 PM, Markus Scherer via Unicode wrote:
>> I am proposing the addition of 2 new characters to the Musical Symbols table:
>>
>> - the half-flat sign (lowers a note by a quarter tone)
>> - the half-sharp sign (rai
> On 16 May 2017, at 15:21, Richard Wordingham via Unicode
> wrote:
>
> On Tue, 16 May 2017 14:44:44 +0200
> Hans Åberg via Unicode wrote:
>
>>> On 15 May 2017, at 12:21, Henri Sivonen via Unicode
>>> wrote:
>> ...
>>> I think Unicod
> On 17 May 2017, at 23:18, Doug Ewell wrote:
>
> Hans Åberg wrote:
>
>>> Far from solving the stated problem, it would introduce a new one:
>>> conversion from the "bad data" Unicode code points, currently
>>> well-defined, would become ambiguous.
>>
>> Actually not: just translate the invali
> On 17 May 2017, at 22:36, Doug Ewell via Unicode wrote:
>
> Hans Åberg wrote:
>
>> It would be useful, for use with filesystems, to have Unicode
>> codepoint markers that indicate how UTF-8, including non-valid
>> sequences, is translated into UTF-32 in a way that the original
>> octet sequen
> On 16 May 2017, at 20:01, Philippe Verdy wrote:
>
> On Windows NTFS (and LFN extension of FAT32 and exFAT) at least, random
> sequences of 16-bit code units are not permitted. There's visibly a
> validation step that returns an error if you attempt to create files with
> invalid sequences (
> On 16 May 2017, at 18:38, Alastair Houghton
> wrote:
>
> On 16 May 2017, at 17:23, Hans Åberg wrote:
>>
>> HFS implements case insensitivity in a layer above the filesystem raw
>> functions. So it is perfectly possible to have files that differ by case
>> only in the same directory by usi
> On 16 May 2017, at 18:13, Alastair Houghton
> wrote:
>
> On 16 May 2017, at 17:07, Hans Åberg wrote:
>>
> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on
> UCS-2/UTF-16. ...
The filesystem directory is using octet sequences and does not bother
> On 16 May 2017, at 17:52, Alastair Houghton
> wrote:
>
> On 16 May 2017, at 16:44, Hans Åberg wrote:
>>
>> On 16 May 2017, at 17:30, Alastair Houghton via Unicode
>> wrote:
>>>
>>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on
>>> UCS-2/UTF-16. ...
>>
>> The
> On 16 May 2017, at 17:30, Alastair Houghton via Unicode
> wrote:
>
> On 16 May 2017, at 14:23, Hans Åberg via Unicode wrote:
>>
>> You don't. You have a filename, which is a octet sequence of unknown
>> encoding, and want to deal with it. Therefore, v
> On 16 May 2017, at 15:00, Philippe Verdy wrote:
>
> 2017-05-16 14:44 GMT+02:00 Hans Åberg via Unicode :
>
> > On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> > wrote:
> ...
> > I think Unicode should not adopt the proposed change.
>
> It would
> On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> wrote:
...
> I think Unicode should not adopt the proposed change.
It would be useful, for use with filesystems, to have Unicode codepoint markers
that indicate how UTF-8, including non-valid sequences, is translated into
UTF-32 in a way
> On 1 May 2017, at 21:12, Michael Bear via Unicode wrote:
>
> I am trying to make a music notation font. It will use the Musical Symbols
> block in Unicode (1D100-1D1FF), but, since that block has a bad rep for not
> being very complete, I added some extra characters...
SMuFL has a rather co
63 matches
Mail list logo