Murray's work comes from the desire to represent mathematical equations
faithfully, based nearly entirely on the semantics of the operators and
having those operators be represented as Unicode characters.
One solution that he uses is the use of redundant parens. Parens can
be supplied to
On 9/14/2011 11:14 AM, Michael Everson wrote:
At this point, I think I have to make a plea: Sarasvati, spare us.
+1
On 9/13/2011 6:01 AM, Philippe Verdy wrote:
Unfortunately, adding controls would imply the creation of new Bidi
classes for them (and forgetting the stability policy about them,
which was published too soon before solving evident problems).
The first part is correct, and giving up stability to
On 9/9/2011 8:12 PM, Stephan Stiller wrote:
Dear Martin,
Thanks for alerting me to the issue of causal direction of aesthetic
preference - it's been on my mind, but your reply helps me sort out
some details.
When I first encountered text (outside of the German language locale)
with ample
On 8/31/2011 11:25 PM, Philippe Verdy wrote:
2011/9/1 Karl Williamsonpub...@khwilliamson.com:
But now that I'm an UTC member, I hope I will hear these cases earlier...
Congratulations!
Does it justify so many new aliases at the same time ?
No. I'm firmly with you, I support the
On 8/28/2011 9:46 PM, Doug Ewell wrote:
Philippe Verdy wrote:
If there are other mappings to do with other standards, and those
standards must be only informative, we already have the /MAPPINGS
directory beside the /UNIDATA directory where the UCD belongs too.
But in general, with the
On 8/28/2011 6:43 PM, Philippe Verdy wrote:
2011/8/27 Asmus Freytagasm...@ix.netcom.com:
I also think that the status field iso6429 is badly named. It should be
control, and what is named control should be control-alternate, or
perhaps, both of these groups should become simply control. I think
On 8/26/2011 10:09 PM, Philippe Verdy wrote:
2011/8/27 Asmus Freytagasm...@ix.netcom.com:
I agree with Ken that Phillipe's suggestion of conflating the annotations
for mathematical use with formal Unicode name aliases is a non-starter.
Yes but why then adding ISO 6429 alias names ? What makes
On 8/26/2011 7:52 PM, Benjamin M Scarborough wrote:
Are name aliases exempted from the normal character naming conventions? I ask
because four of the entries have words that begin with numbers.
008E;SINGLE-SHIFT 2;control
008F;SINGLE-SHIFT 3;control
0091;PRIVATE USE 1;control
0092;PRIVATE USE
On 8/27/2011 1:31 AM, Andrew West wrote:
On 27 August 2011 09:25, Andrew Westandrewcw...@gmail.com wrote:
On 27 August 2011 03:52, Benjamin M Scarborough
benjamin.scarboro...@utdallas.edu wrote:
Are name aliases exempted from the normal character naming conventions? I ask
because four of
I agree with Ken that Phillipe's suggestion of conflating the
annotations for mathematical use with formal Unicode name aliases is a
non-starter. The former exist to help mathematicians identify symbols in
Unicode, when they know their name from entity lists. The latter are
designed to allow
On 8/24/2011 7:45 PM, Richard Wordingham wrote:
Which earlier coding system supported Welsh? (I'm thinking of 'W WITH
CIRCUMFLEX', U+0174 and U+0175.) How was the use of the canonical
decompositions incompatible with the character encodings of legacy
systems? Latin-1 has the same codes as
On 8/23/2011 7:22 AM, Doug Ewell wrote:
Of all applications, a word processor or DTP application would want to
know more about the properties of characters than just whether they are
RTL. Line breaking, word breaking, and case mapping come to mind.
I would think the format used by standard UCD
On 8/23/2011 12:00 PM, Richard Wordingham wrote:
On Mon, 22 Aug 2011 16:18:56 -0700
Ken Whistlerk...@sybase.com wrote:
How about Clause 12.5 of ISO/IEC 10646:
001B, 0025, 0040
You escape out of UTF-16 to ISO 2022, and then you can do whatever
the heck you want, including exchange and
On 8/21/2011 7:34 PM, Doug Ewell wrote:
So what you are asking about is a directional control character that would
assign subsequent characters a BC of 'AL', right?
You don't want to call this a LANGUAGE MARK or anything else that implies language
identification, because of the existence of
Huh? What context is this in?
On 8/22/2011 11:18 AM, CE Whitehead wrote:
Hi.
I think many line breaks within paragraphs are soft line breaks but
that embedding levels have to be taken into account when deciding the
width of the glyphs; that's as near as I can tell.
Here is the description
On 8/21/2011 3:31 PM, Richard Wordingham wrote:
On Sun, 21 Aug 2011 11:00:26 -0600
Doug Ewelld...@ewellic.org wrote:
I think as soon as we start talking about this many scenarios, we are
no longer talking about what the *default* bidi class of the PUA (or
some part of it) should be. Instead,
On 8/20/2011 6:44 PM, Doug Ewell wrote:
Would that really be a better default? I thought the main RTL needs for the PUA
would be for unencoded scripts, not for even more Arabic letters. (How many
more are there anyway?)
In any case, either 'R' or 'AL' as the Plane 16 default would be an
On 8/19/2011 2:35 PM, Jukka K. Korpela wrote:
20.8.2011 0:07, Doug Ewell wrote:
Of course, 2.1 billion characters is also overkill, but the advent of
UTF-16 was how we ended up with 17 planes.
And now we think that a little over a million is enough for everyone,
just as they thought in the
On 8/19/2011 3:24 PM, Ken Whistler wrote:
On 8/19/2011 2:07 PM, Doug Ewell wrote:
Technically, I think 10646 was always limited to 32,768 planes so that
one could always address a code point with a 32-bit signed integer (a
nod to the Java fans).
Well, yes, but it didn't really have anything
On 8/18/2011 7:29 AM, Doug Ewell wrote:
Karl Pentzlinkarl dash pentzlin at acssoft dot de wrote:
The quoted indicators for benefit were part of a concern of the German
NB regarding the Wingding/Webding proposals. The concern expressed in
WG2 N4085 is that some characters proposed there
On 8/16/2011 1:57 AM, Andrew West wrote:
On 16 August 2011 02:59, Richard Wordingham
richard.wording...@ntlworld.com wrote:
All I've got to go on is the penultimate sentence in TUS 6.0 Section
10.2 - 'Rarely, stacks are seen that contain more than one such
consonant-vowel combination in a
On 8/16/2011 3:32 PM, Andrew West wrote:
On 16 August 2011 18:19, Asmus Freytagasm...@ix.netcom.com wrote:
These stacks are highly unusual and are considered beyond the scope
of plain text rendering. They may be handled by higher-level
mechanisms.
The question is: have any such mechanisms
On 8/14/2011 1:39 PM, Richard Wordingham wrote:
U+00B5 MICRO SIGN is an ISO-8859-1 character, and was therefore
included as U+00B5. It normally precedes a Latin-script letter, and
therefore it actually makes sense to treat it as a Latin-script
character, and possibly give it a different shape
On 8/14/2011 12:51 PM, Jukka K. Korpela wrote:
14.8.2011 17:51, Doug Ewell wrote:
This sounds like Jukka expects browsers to analyze the glyph assigned in
the font to the code position for 'a' and decline to display it if it
doesn't look enough like an 'a' (rejecting, for example, Greek 'α').
The ambiguity of an initial FEFF was not desirable, but this discussion shows
that certain things can't be so easily fixed by adding characters at a later
stage.
The more time elapsed between encoding of the ambiguous character and the later
fix the more software, the more data, and the more
On 7/17/2011 2:47 AM, Petr Tomasek wrote:
On Sun, Jul 17, 2011 at 10:14:55AM +0100, Julian Bradfield wrote:
Wouldn't it be more economical to encode a single UNICODE ESCAPE
CHARACTER which forces the following character to be interpreted as a
printable glyph rather than any control function?
On 7/17/2011 12:19 PM, Doug Ewell wrote:
Asmus wrote:
The reason is, of course, because these codes would *reinterpret* existing
characters. You could argue that Variation Selectors do the same, but they are
carefully constructed so that they can be safely ignored.
Variation selectors
On 7/17/2011 12:19 PM, Philippe Verdy wrote:
2011/7/17 Asmus Freytagasm...@ix.netcom.com:
On 7/17/2011 2:35 AM, Michael Everson wrote:
... invisible and stateful control characters are more expensive than
ordinary graphic symbols.
In this case, the expense is so much higher as to rule out
On 7/15/2011 10:48 PM, Doug Ewell wrote:
I apologize for the unintended content-free post. It's my phone's fault.
--
My dog ate the homework - 2011?
:)
A./
On 7/16/2011 1:53 AM, Michael Everson wrote:
On 16 Jul 2011, at 04:37, Asmus Freytag wrote:
It's not a matter of competing views. There's a well-defined process for
adding characters to the standard. It starts by documenting usage.
Yes, Asmus, and when one wants to do that, one writes
Karl,
I've published similar surveys in the past, where the object was to
get feedback on the desirability of further action. I stick by my
recommendation in favor of keeping raw data out of the document
registry and of doing the committee a favor by adding value in form of
a sifting or
On 7/15/2011 1:08 AM, Karl Pentzlin wrote:
In WG2 N4085 Further proposed additions to ISO/IEC 10646 and comments to other
proposals (2011‐
05‐25), the German NB had requested re WG2 N4022 Proposal to add Wingdings and
Webdings
Symbols besides other points:
Also, in doing this work, other
On 7/15/2011 9:03 AM, Doug Ewell wrote:
Andrew Westandrewcwest at gmail dot com replied to Michael Everson:
I think that having encoded symbols for control characters (which we
already have for some of them) is no bad thing, and the argument
about too many characters is not compelling, as
On 7/15/2011 2:23 AM, Karl Pentzlin wrote:
Am Freitag, 15. Juli 2011 um 10:58 schrieb Asmus Freytag:
AF ... There appear to be a large number of symbols for which a
AF Unicode equivalent can be identified with great certainty -
AF and beyond that there seem to be characters for which such
AF
On 7/15/2011 10:26 AM, Michael Everson wrote:
What I see is a certain unreasonability reflecting a certain conservatism. Text
about the Standard is important, and should be representable in an
interchangeable way. Here { } is a Right to left override character. character.
I want to talk about
On 7/15/2011 11:05 AM, Doug Ewell wrote:
What I see is a certain unreasonability reflecting a certain conservatism. Text
about the Standard is important, and should be representable in an
interchangeable way. Here { } is a Right to left override character. character.
I want to talk about it
On 7/15/2011 11:36 AM, Michael Everson wrote:
However, I agree with Asmus that in the context of the Wingdings-type symbols
these characters should not be considered. They should be considered as a whole
on their own.
Thank you Michael.
To reiterate and restate (so it can be read out of
Jukka,
reminding everyone of the definition of technical term as opposed to a
word in everyday language isn't helping address the underlying issue.
Everyone is familiar with this distinction.
You note that there's a bit of a truism that underlies the definition of
character and character
On 7/11/2011 11:57 AM, Ken Whistler wrote:
On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote:
For the long term, I suggest Unicode should aim for this:
That kind of terminological purity isn't going to occur.
...
The Unicode Consortium has a glossary of terms:
...
But the Unicode
On 7/7/2011 8:42 PM, Karl Williamson wrote:
On 07/07/2011 02:33 PM, announceme...@unicode.org wrote:
Proposed updates for most Unicode Standard Annexes for Version 6.1 of
the Unicode Standard have been posted for public review.
Many of the documents appear to have no current modifications to
On 7/3/2011 6:31 AM, Philippe Verdy wrote:
Regarfing the previous comment about the Danish aa,
Sorry, most of that discussion missed the mark.
Modern Danish can have AA for two reasons. Accidental occurrence, as
in dataanalyse which is composed of two words which just happens to
put two A
On 7/6/2011 12:16 AM, Jukka K. Korpela wrote:
Allowing word division just to say that some characters do not
constitute a digraph (or trigraph…) is not practical e.g. when the
text has otherwise no word divisions, for one reason or another, or
when the particular word division point is
On 7/2/2011 8:59 AM, Philippe Verdy wrote:
2011/7/2 Andrew Millera.j.mil...@bcs.org.uk:
The ng in Llangollen is not the digram ng but two separate letters
(unlike the ll in the name which is the digram).
Why not simply using a soft hyphen between n and g in this case ?
Soft hyphens are
On 7/1/2011 12:06 AM, Peter Krefting wrote:
Hi!
On line 65 of
http://www.unicode.org/Public/PROGRAMS/BidiReferenceCpp/bidi.cpp
(version 26) the word utility is spelled as uitlity (line 80 has
the correct spelling).
Not that it matters much, just something we noticed.
If it's in a comment,
On 6/28/2011 1:51 AM, Michael Everson wrote:
On 28 Jun 2011, at 09:28, Jean-François Colson wrote:
In Times New Roman, which is the default font for MS Word (probably the best
known word processor), the letters “a” and “ɑ” are indistinguishable in italics.
That is a fault of the font.
No,
On 6/28/2011 1:40 AM, Andreas Stötzner wrote:
Am 28.06.2011 um 09:43 schrieb Jean-François Colson:
I’m interested in Unifon (http://www.unifon.org). That’s a phonemic
alphabet for English which is used to teach reading.
Although it has been encoded in the ConScript Unicode Registry as a
new
On 11/23/2010 1:58 AM, sowmya satyanarayana wrote:
This what I am actually looking for. My ODBC application supports
UTF-16, which is 2 byte width characters. This application is
completely oriented around using _T(x) macro as Asmus Freytag figured out.
Yeah, it's nice when you can do
On 11/22/2010 4:15 AM, Michael Everson wrote:
It boils down to this: just as there aren’t technical or usability reasons that
make it problematic to represent IPA text using two Greek characters in an
otherwise-Latin system,
Yes there are. Sorting multilingual text including Greek and IPA
On 11/22/2010 10:18 AM, Phillips, Addison wrote:
sowmya satyanarayanasowmya underscore satyanarayana at yahoo dot
com
wrote:
Taking this, what is the best way to define _T(x) macro of
UNICODE version, so
that my strings will always be
2 byte wide character?
Unicode characters aren't always
On 11/22/2010 11:08 AM, Asmus Freytag wrote:
depending on whether some global compile time flat (usually UNICODE or
_UNICODE) is set or not.
recte: flag.
On 11/18/2010 11:15 PM, Peter Constable wrote:
If you'd like a precedent, here's one:
Yes, I think discussion of precedents is important - it leads to the
formulation of encoding principles that can then (hopefully) result in
more consistency in future encoding efforts.
Let me add the
On 11/18/2010 8:04 AM, Peter Constable wrote:
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
Of André Szabolcs Szelp
AFAIR the reservations of WG2 concerning the encoding of Jangalif
Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but
rather in
On 11/15/2010 2:24 PM, Kenneth Whistler wrote:
FA47 is a compatibility character, and would have a compatibility mapping.
Faulty syllogism.
Formally correct answer but only because of something of a design flaw
in Unicode. When the type of mapping was decided on, people didn't fully
expect
On 11/15/2010 5:43 PM, Kenneth Whistler wrote:
Perhaps someone would like to make a detailed proposal to
the UTC for how to fix the text and charts?;-)
Ken,
having shown yourself the master of detail in your reply, I think you've
appointed yourself.
A round of applause for Ken!
See how
On 11/14/2010 12:57 PM, Doug Ewell wrote:
Jim Monty jim dot monty at yahoo dot com wrote:
Japanese kana (the J in CJK) and Korean syllables (the K in
CJK) both have different normalization forms. What do ideographs
have to do with anything? I didn't mention ideographs; you did.
The term CJK
If you want to get that point across to a general audience, you could
use a more colloquial term, albeit one that itself derives from mathematics.
Text that can be completely expressed in ASCII is fits into something
(ASCII) that works as a lowest common denominator of a large number of
On 11/4/2010 5:46 PM, Doug Ewell wrote:
Markus Scherer wrote:
While processing 16-bit Unicode text which is not assumed to be
well-formed UTF-16, you can treat (decode) an unpaired surrogate as a
mostly-inert surrogate code point. However, you cannot unambiguously
encode a surrogate code
On 11/5/2010 7:02 AM, Doug Ewell wrote:
Asmus Freytagasmusf at ix dot netcom dot com wrote:
I'm probably missing something here, but I don't agree that it's OK
for a consumer of UTF-16 to accept an unpaired surrogate without
throwing an error, or converting it to U+FFFD, or otherwise raising
On 10/17/2010 7:01 AM, Michael D. Adams wrote:
This is something that not even the C++ and Java reference
implementations do (though it appears that the C++ implementation of
the W rules was originally derived from a regular expression as it
uses state tables, but if so it is undocumented).
On 10/17/2010 10:59 AM, Michael D. Adams wrote:
The biggest challenge was not in creating those tables, but in
understanding the nuances of the rules, by the way.
Two questions so I can understand better.
First, by nuances do you mean the nuances of how the rules interact
(which I think would
On 10/16/2010 10:38 AM, suzuki toshiya wrote:
Hi,
I've never heard any comments about the reservation
of the codepoints to making the code chart structure
similar among multiple script, no posive, no negative.
So your comment is interesting. Could you tell me more
about what kind of
On 10/11/2010 9:49 PM, Janusz S. Bień wrote:
On Mon, 11 Oct 2010 announceme...@unicode.org wrote:
The newly finalized Unicode Version 6.0 adds 2,088 characters,
What is the current total? Are other statistic informations available
somewhere?
The announcement gives a link to click
Ken,
some comments, and a few suggestions near the end.
On 10/12/2010 4:56 PM, Kenneth Whistler wrote:
Karl Williamson asked:
The Unicode standard only gives numeric values to rational numbers. Is
the reason for this merely because of the difficulty of representing
irrational ones?
No.
On 9/18/2010 8:36 AM, abysta wrote:
Hello.
I need a dot to separate words into syllables. What should I use, 00B7 or 2027,
and why?
2027 is explicitly intended to be used to show syllables as is done in
dictionaries. You don't make it explicit in your query, but it sounds
like that is
On 9/18/2010 10:56 AM, Lorna Priest wrote:
U+00B7 MIDDLE DOT is semantically ambiguous and has (partly
therefore) varying renderings, and it might be used as a replacement
for U+2027 if the latter cannot be used reliably.
What about using U+02D1 - half triangular colon?
Why not use
The first discussions that lead to the current formulation of the bidi
algorithm easily go back 20 years by now. There's some value in not
re-stating a specification - even if a new formulation could be found to
be 100% equivalent. That value lies in the fact that any reader can
tell, by
On 8/6/2010 2:03 AM, William_J_G Overington wrote:
On Thursday, 5 August 2010, Kenneth Whistler k...@sybase.com wrote:
I am thinking of where a poet might specify an ending version of a glyph at the
end of the last word on some lines, yet not on others, for poetic effect. I
think that it
On 8/5/2010 3:47 AM, William_J_G Overington wrote:
On Wednesday 4 August 2010, Asmus Freytag asm...@ix.netcom.com wrote:
However, there's no need to add variation sequences to
select an *ambiguous* form. Those sequences should be
removed from the proposal.
Are you here talking about
On 8/2/2010 5:04 PM, Karl Pentzlin wrote:
I have compiled a draft proposal:
Proposal to add Variation Sequences for Latin and Cyrillic letters
The draft can be downloaded at:
http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic2.pdf (4.3 MB).
The final proposal is intended to be submitted
On 8/4/2010 1:30 PM, verdy_p wrote:
Asmus Freytag wrote:
The Fraktur problem is one where one typestyle requires additional
information (e.g. when to select long s) that is not required for
rendering the same text in another typestyle. If it is indeed desirable
(and possible) to create
Philipe,
Text typeset in Fraktur contains more information than text typset in
Antiqua. That means, there are some places where there are some (mild)
ambiguities in representation in the Antiqua version. Not enough to
bother a human reader who can use deep context to read the text
correctly,
On 7/28/2010 9:32 PM, Doug Ewell wrote:
Murray Sargent murrays at exchange dot microsoft dot com wrote:
It's worth remembering that plain text is a format that was
introduced due to the limitations of early computers. Books have
always been rendered with at least some degree of rich text. And
On 7/28/2010 2:02 AM, Kent Karlsson wrote:
Den 2010-07-28 09.50, skrev Jukka K. Korpela jkorp...@cs.tut.fi:
André Szabolcs Szelp wrote:
Generally, for the decimal point . (U+002E FULLSTOP) and , (U+002C
COMMA) is used in the SI world. However, earlier conventions could use
different
On 7/28/2010 10:09 AM, Murray Sargent wrote:
Contextual rendering is getting to be more common thanks to adoption of OpenType features. For example, both MS Publisher 2010 and MS Word 2010 support various contextually dependent OpenType features at the user's discretion. The choice of glyph for
On 7/28/2010 10:13 PM, Martin J. Dürst wrote:
Sequences of numeric Kanji are also used in names and word-plays, and
as sequences of individual small numbers.
But the same applies to our digits. A very simple example is to use
them as a ruler in plain text:
1 2 3
On 7/27/2010 3:02 PM, Kenneth Whistler wrote:
Karl Williamson asked:
Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT
does?
They are U+2107 and U+210E respectively.
Because U+210E PLANCK CONSTANT is, to quote the standard,
simply a mathematical
On 7/26/2010 12:13 PM, Mark Davis ☕ wrote:
I agree that having it stated at point of use is useful - and we do
that in other cases covered by stability clauses; but we can only
state it IF we have the corresponding stability policy.
Mark,
The statement in your but clause really isn't correct.
The short answer to Karl's question is that there will not be an
absolute guarantee.
The long answer is that, partly for the reasons he's mentioned, this
won't be a practical problem.
A. Most of the living scripts that are in wide use have been encoded,
including whatever digits are in use.
On 7/25/2010 6:05 PM, Martin J. Dürst wrote:
On 2010/07/26 4:37, Asmus Freytag wrote:
PPS: a very hypothetical tough case would be a script where letters
serve both as letters and as decimal place-value digits, and with modern
living practice.
Well, there actually is such a script, namely
On 7/24/2010 3:00 PM, Bill Poser wrote:
On Sat, Jul 24, 2010 at 1:00 PM, Michael Everson ever...@evertype.com wrote:
Digits can be scattered randomly about the code space and it wouldn't make any
difference.
Having written a library for performing conversions between Unicode
strings
Andreas,
I think we all realize your frustration with well-meaning software.
Because tags can be wrong for no fault of the human originating the
document,
I fully understand that Google might want to attempt to improve the user
experience in such situations.
The problem is that doing so
On 6/28/2010 11:38 AM, Mark Davis ☕ wrote:
The problem with slavishly following the charset parameter is that it
is often incorrect. However, the charset parameter is a signal into
the character detection module, so the charset is correctly supplied
from the message then the results of the
I'd like to second Mark.
There is a lot of information in the Standard, including the UAXs, and
the Unicode Character Database that would help answer your questions.
The volunteers associated with the Unicode effort have worked hard
putting all that information together - so use it, instead
The one argument that I find convincing is that too many implementations
seem set to disallow generic combination, relying instead on fixed
tables of known/permissible combinations.
In that situation, a formally adopted character with the clearly stated
semantic of is expected to actually
On 6/26/2010 5:41 PM, Doug Ewell wrote:
Regarding the inability to distinguish 8859-15 heuristically from
8859-1, I understand the problem when there are no tags or other
hints, or for cases like Windows-1252 text declared to be 8859-1, but
it seems unlikely to me that there is much text
On 6/17/2010 7:24 PM, Tulasi wrote:
What is equivalent ISO/IEC
ISO/IEC what?
There are hundreds of ISO/IEC standards, of which dozens are character
encoding standards.
for U+0278 LATIN SMALL LETTER PHI (ɸ)?
Or do Unicode ISO/IEC use different number name for same letter/symbol?
On 6/14/2010 1:18 PM, Mark E. Shoulson wrote:
On 06/14/2010 02:15 PM, Asmus Freytag wrote:
On 6/14/2010 9:21 AM, Stephen Slevinski wrote:
Plain text SignWriting should be able to write actual sign language,
such as hello world.
You could equally well insist that it should be possible
Can we stop double posting on Unicode and Unicore list?
People on the unicode list cannot reply to people on the other list,
and vice versa (unless they happen to be mermbers of both lists).
Thanks.
A./
On 6/7/2010 4:26 PM, Masaaki Shibata wrote:
I'm studying the UAX #14 (5.2.0) and testing my code against
LineBreakTest.txt. And I found some test cases on this text file seem
to be contradictory to the rules on the document.
For example, LB25 explicitly prohibits breaking between CP and PO,
On 6/4/2010 8:34 AM, Mark Davis ☕ wrote:
In a compression format, that doesn't matter; you can't expect random
access, nor many of the other features of UTF-8.
The minimal expectation for these kinds of simple compression is that
when you write a string with a particular /write/ method, and
On 6/1/2010 6:04 PM, Mark Crispin wrote:
I don't think that the unicode list should be used for the type of
questions that have polluted it recently.
That list unicode@unicode.org is open for general questions.
It has no formal standing as far as the business of the Consortium
is concerned, and
On 6/1/2010 8:04 PM, Kannan Goundan wrote:
I'm trying to come up with a compact encoding for Unicode strings for
data serialization purposes. The goals are fast read/write and small
size.
Why not use SCSU?
You get the small size and the encoder/decoder aren't that complicated.
You get the
On 6/2/2010 11:46 AM, Jonathan Rosenne wrote:
Although this mail was not addressed to me, I did read it. Sue me.
The terms of use for the Unicode mail list essentially state that these
types of boilerplate are null and void as far as Unicode is concerned.
You will find the following in
On 6/2/2010 3:28 PM, John Dlugosz wrote:
If anyone can “null and void” it, I wonder why companies bother to put
such things in people’s outgoing mail. I would have thought they could
come up with a proper net-etiquite version, but they just don’t care.
These things are bogus, because they
SCSU is a pass-through for ASCII, plus it handles the common mix of
ASCII plus 96 local characters (Latin-1, Greek, Cyrillic, Thai, etc)
really fast. Go look at the sample code. If you take that as starting
point for optimization, I think you'll be fine.
On 6/1/2010 1:37 PM, John Dlugosz wrote:
Why does the code chart call the plain Greek letter (upper and lower
case) “LAMDA” rather than “LAMBDA”? The latter is used in other places
where a glyph is based on the lambda, e.g. “U+019B LATIN SMALL LETTER
LAMBDA WITH STROKE”
Names sometimes
On 6/1/2010 4:14 PM, Mark Crispin wrote:
Is it really necessary to have this sort of pedagogical discussions on
the
Unicode list?
Is this character name misspelled?
Is Unicode a for-profit company?
Who owns the Unicode font?
etc. etc.
Perhaps we need to have a
On 5/31/2010 12:33 PM, Tulasi wrote:
Thanks Mark for posting the links!
My posting was based on
http://www.unicode.org/consortium/directors.html
where in the bottom it said Unicode Inc.
Looks like the elected members from consortium
http://www.unicode.org/consortium/consort.html
forms Unicode
On 5/31/2010 2:12 PM, V. M. Kumaraswamy wrote:
Hello all,
Just a clarification an UNICODE.
Is UNICODE a STANDRAD
Yes, Unicode (The Unicode Standard), is indeed a standard.
And no, the use of ALL CAPS is discouraged. The
proper spelling is Unicode.
that needs to be followed by all
801 - 900 of 1250 matches
Mail list logo