Peter_Constable at sil dot org wrote:
A couple of corrections. First, if an app supports only WM_CHAR and
not also WM_UNICHAR, that does not imply that it uses a legacy
encoding. If running on NT/2K/XP and registered as a wide (Unicode)
app, the WM_CHAR messages will supply UTF-16 code
-BEGIN PGP SIGNED MESSAGE-
Michael (michka) Kaplan wrote:
Not sure how this could be generally possible to restrict, since
WinNT/2K/XP/.Net all will transparently map CF_TEXT an CF_UNICODETEXT so
that if one if put on the clipboard and the other is asked for, you will get
it.
Peter_Constable at sil dot org wrote:
Something that wouldn't be difficult would be an item that copied data
to the clipboard, and then displayed character info based on the
clipboard content.
Hmm, an interesting thought. I would be willing to write a mini-tool
like this, if enough people
On 08/28/2002 05:38:05 PM Doug Ewell wrote:
Edit controls (edit boxes, text widgets) in Windows already come
equipped with a right-click menu...
It's not hard to imagine that menu being extended with a Character
Info or What's This Glyph? item...
Of course, I have no idea if such a thing will
Kenneth Whistler wrote the following at 2:01 PM on Mon, Aug 26, 2002:
And an approach which strikes me as a much more useful and extensible
way to deal with this would be the concept of a What's This?
text accessory. Essentially a small tool that a user could select
a piece of text with (think
Dean Snyder dean dot snyder at jhu dot edu wrote:
Good idea - the big attraction being extensibility. But a detraction
is that it would typically mean multiple, or at least explicit,
deployment at the application level on any given platform. (I'm
presuming such a system service would present
Doug Ewell wrote the following at 8:38 AM on Wed, Aug 28, 2002:
But the advantage would be the same as what Dean
envisions for a font-based solution -- applications would get the
support for free, instead of having to re-implement it in multiple,
slightly different ways.
I don't believe so.
[Resend of a response which got eaten by the Unicode email
during the system maintenance last week. Carl already responded
to me on this, but others may not have seen what he was
responding to. --Ken]
Proposed unknown and missing character representation. This would be an
alternate to method
William,
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of William Overington
Sent: Friday, August 23, 2002 12:55 AM
To: James Kass; Carl W. Brown; Unicode List
Cc: [EMAIL PROTECTED]
Subject: Re: Revised proposal for Missing character glyph
Kenneth Whistler scripsit:
Things will be better-behaved when applications finally get past the
related but worse problem of screwing up the character encodings --
which results in the more typical misdisplay: lots of recognizable
glyphs, but randomly arranged into nonsensical junk. (Ah,
Of Kenneth Whistler
Sent: Monday, August 26, 2002 2:01 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Revised proposal for Missing character glyph
[Resend of a response which got eaten by the Unicode email
during the system maintenance last week. Carl already responded
to me
At 09:49 PM 8/26/2002 -0400, John Cowan wrote:
Nowadays, experts can detect mismatched character sets from the
nature of the byte barf that appears on their screen.
And super-experts can read languages in byte barf as it is not random!
Barry Caplan
http://www.i18n.com
-BEGIN PGP SIGNED MESSAGE-
Carl W. Brown wrote:
Proposed unknown and missing character representation. This would be an
alternate to method currently described in 5.3.
The missing or unknown character would be represented as a series of
vertical hex digit pairs for each byte of
PROTECTED]
Subject: Re: Revised proposal for Missing character glyph
Proposed unknown and missing character representation. This would be an
alternate to method currently described in 5.3.
The missing or unknown character would be represented as a series of
vertical hex digit pairs for each
Proposed unknown and missing character representation. This would be an
alternate to method currently described in 5.3.
The missing or unknown character would be represented as a series of
vertical hex digit pairs for each byte of the character. BMP characters
would be represented with 4 hex
I presume that the user has to know that the character cannot be displayed.
However using a special glyph has a number of problems:
1) You do not know if the character is missing and the glyph is substituted
or if the text really encodes the glyph.
2) If you see multiple missing characters,
Carl W. Brown wrote:
I presume that the user has to know that the character cannot be displayed.
I don't see how the user can know this. Depending on the usage, an odd
glyph can look like a bullet or other marker. In some cases therefore,
the user might presume it is just a unique way of
With a bit more thought we might reduce the minimum point size of an
unrenderable character as follows:
The numbers represent a dot position of that bit is a one. It is blank if
the bit is 0.
The XX characters are lines with an inverted wide squared U at the top with
the edges coming down to
John H. Jenkins jenkins at apple dot com wrote:
There has been considerable uproar in the font development community
lately about Unicode making unwarranted assumptions about how fonts
work. I think it would be improper for us to add a character to the
standard on the basis of font
On 08/02/2002 12:19:20 AM James Kass wrote:
Now I get it. Still think Kenneth Whistler's suggestions for covering
all kinds of display problems would be better than encoding a new
character for this limited purpose, though.
It's about as useful as the control pictures (2400..2426 -- which is
Periphrasis is always possible, of course; but that doesn't mean that it is desirable.
1. Periphrasis is by definition longer. In a page where you want to present a lot of
information and not have it squeezed out by meta-information, the first paragraph in
my example could read Seeing things
It would be a nice way to address the issue.
In an ideal world, every computer would have a last resort font so that it can
*always* find a glyph for a particular codepoint, and there would then be no need for
any glyph that says sorry, can't display.
I think you will probably agree that an
Well...
1. Since all existing fonts already display the new character correctly, there would
be no overwhelming need for any font designer to alter any font at all. If they
choose, despite this, to copy their own interpretation of 'missing character' from
Glyph ID zero into the new slot, this
At 11:11 +0100 2002-08-01, Martin Kochanski wrote:
Otto Stolz suggested U+03A2, which would be equally valid. However,
U+03A2 is quite obviously the code for GREEK CAPITAL LETTER FINAL
SIGMA.
Nope. Can't encode a nonexistent letter. Can encode ANYTHING WE WANT
anywhere that's free. Them's
If anyone wants to represent fontless characters and uses anything
other than a kind of Last Resort font they are being very silly in my
view. OmniWeb and TextEdit handle these elegantly and helpfully.
--
Michael Everson *** Everson Typography *** http://www.evertype.com
At 11:13 +0200 2002-08-01, Otto Stolz wrote:
I have selected U+03A2 with care: this code point covers the place
of a non-existing Greek capital letter final sigma. I think that
this code-point -- while, admittedly, unsafe as any other unassigned
one -- is rather unlikely to get assigned a
Asmus Freytag wrote:
At 08:40 PM 7/30/02 -0700, Doug Ewell wrote:
a code-point that has no
character assigned to it (and is not likely to get one), e. g. U+03A2
Doug is not the culprit; it was me who wrote that sentence.
No code point is safe.
The very reason why I chose the wording
The responses from this mailing list have made me re-think the problem and propose a
possible solution.
The point about missing characters (more accurately, unrendered characters) is that
different fonts (more accurately, different combinations of font plus rendering
system) display them in
Missing character glyphs in fonts have a wide variance of appearances,
just like LATIN CAPITAL LETTER A glyphs.
If it will help to understand this issue, here is how it works from
a font perspective.
The missing glyph is the first glyph in any font. This is mapped to
U+ and the system
On 08/01/2002 05:11:08 AM Martin Kochanski wrote:
I am wondering whether it would be worth submitting a proposal for such a
character. For example:
U+024F UNRENDERED CHARACTER
Go ahead and write up a proposal. This is one of those meta-descriptive
things that somebody inevitably finds a
Martin Kochanski unicode at cardbox dot net wrote:
To look at it another way, virtually the only action that the Unicode
Consortium needs to take to define UNRENDERED CHARACTER is to promise
never to define a character at that code point.
I think this is exactly what they have done by
I wrote:
You can't use U+FFEF,
because some process might actually filter out noncharacters.
Of course I meant U+FFFE. There is nothing special about U+FFEF;
although that code point is currently unassigned, there's no reason it
couldn't be in the future. (It would be the perfect spot for
At 05:18 AM 8/1/02 -0700, James Kass wrote:
If it will help to understand this issue, here is how it works from
a font perspective.
The missing glyph is the first glyph in any font. This is mapped to
U+ and the system correctly substitutes the glyph mapped to
U+ any time a font being
__
http://www.macchiato.com
► “Eppur si muove” ◄
- Original Message -
From: Martin Kochanski [EMAIL PROTECTED]
To: Otto Stolz [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, August 01, 2002 03:11
Subject: Re: Missing character glyph
The responses from
__
http://www.macchiato.com
► “Eppur si muove” ◄
- Original Message -
From: Martin Kochanski [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Asmus Freytag [EMAIL PROTECTED]
Sent: Thursday, August 01, 2002 09:38
Subject: Re: Missing character glyph
At 08:42 01/08/02
]
To: Otto Stolz [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, August 01, 2002 03:11
Subject: Re: Missing character glyph
The responses from this mailing list have made me re-think the problem and
propose a possible solution.
The point about missing characters (more accurately, unrendered
As a clarification, here is a sample web page:
http://www.cardbox.com/missing.htm
The requirement is to be able to display the first paragraph of the page in such a way
that it makes sense in its reference to the text on the rest of the page.
The character after the word this: in the first
As a clarification, here is a sample web page:
http://www.cardbox.com/missing.htm
The requirement is to be able to display the first paragraph of the
page in such a way that it makes sense in its reference to the text
on the rest of the page.
The character after the word this: in
I can't seem to find a mention of this earlier (my apologies if I missed
it) but the 'missing character' glyph is not necessarily system specific -
in fact at least TrueType and OpenType support a font specific missing
glyph. Any unassigned code point you choose will get the missing glyph
On 08/01/2002 02:34:17 PM Kenneth Whistler wrote:
But if you insist on having a code point to stick directly in
a sentence like that above, I'd take the cue from James Kass:
The missing glyph is the first glyph in any font. This is mapped to
U+ and the system correctly substitutes the
At 01:42 PM 01-08-02, [EMAIL PROTECTED] wrote:
I think James is mistaken on this point: the missing glyph *is* the first
glyph in any TTF, but it is *not* necessarily (probably not typically)
mapped from U+. For instance, in Times New Roman, Arial, Tahoma and
even James' own Code2000, the
Any unassigned code point you choose will get the missing glyph
display for the font that happens to be selected for that character (on
many systems that's no longer easily predictable, due to font
substitutions). In short, there's no guarantee that *any* specific code
point will give you the
John Hudson wrote:
but it should *not* be encoded as U+ or as any other codepoint.
.notdef should be unencoded.
Almost. OpenType specifies that there is no functional difference
between a code point that is not mapped and a code point that is
explicitly mapped to GID 0, so there is
Peter Constable wrote,
... For instance, in Times New Roman, Arial, Tahoma and even
James' own Code2000, the first entry in the cmap is for U+0020:
Please note that the first entry in the cmap covers Glyph ID 3.
Glyph IDs 0, 1, and 2 don't need to be covered by cmap, as they
are constants
James Kass scripsit:
Even if a new character is proposed and accepted, font developers
will probably just copy their own interpretation of 'missing
character' from Glyph ID Zero into the new slot. What would
be gained?
It would be guaranteed that every font would encode this character
Doug Ewell wrote:
(Sent: Tuesday, July 30, 2002)
Have Last Resort symbols been devised for all the blocks in Unicode,
including the new ones like Tagalog? Neither Mark Leisher's page nor
the Apple typography page contains a complete list.
(As a Windows user who seldom finds reason to
At 17:15 -0400 2002-07-30, Tom Gewecke wrote:
Apple's Last Resort font. :-)
Which I believe uses the various symbols shown at
http://www.unicode.org/charts/
so you can easily tell from which code range your font is missing the
character.
I think those glyphs are from the older version of
On Tuesday, July 30, 2002, at 08:58 PM, Doug Ewell wrote:
Have Last Resort symbols been devised for all the blocks in Unicode,
including the new ones like Tagalog? Neither Mark Leisher's page nor
the Apple typography page contains a complete list.
Yes. It covers all of Unicode 3.2; but
At 08:40 PM 7/30/02 -0700, Doug Ewell wrote:
a code-point that has no
character assigned to it (and is not likely to get one), e. g. U+03A2
No code point is safe.
A./
Asmus wrote:
At 08:40 PM 7/30/02 -0700, Doug Ewell wrote:
a code-point that has no
character assigned to it (and is not likely to get one), e. g. U+03A2
No code point is safe.
True enough. But then I figure Plane 13 characters like
U+DEAD1 are pretty unlikely to be assigned to a
Asmus Freytag asmusf at ix dot netcom dot com wrote:
No code point is safe.
Indeed, but some are less unsafe than others. You can't use U+FFEF,
because some process might actually filter out noncharacters. You can't
use U+FFFD, because some process might generate a special glyph for it
(SC
In writing a manual, I want to show examples of what a display looks like when a font
doesn't have a particular character.
What Unicode character would best represent the missing character symbol?
I have looked through the Unicode Standard but not found anything that immediately
springs to
At 18:27 +0100 2002-07-30, Martin Kochanski wrote:
In writing a manual, I want to show examples of what a display looks
like when a font doesn't have a particular character.
What Unicode character would best represent the missing character symbol?
Apple's Last Resort font. :-)
--
Michael
Actually - thinking about it - wouldn't U+FFFE work?
Martin Kochanski wrote:
In writing a manual, I want to show examples of what a display looks like
when a font doesn't have a particular character.
I suggest, you also show examples of what a display looks like when a
data-stream is not encoded properly,
e. g.
question: you want a character that provides a
specific glyph shape, but then you observe that different fonts have
different shapes from one another wrt a given glyph. Note that the missing
character glyph varies from font to font. In fact, it's not always a
hollow square!
Just use U+25A1 and call
At 18:27 +0100 2002-07-30, Martin Kochanski wrote:
In writing a manual, I want to show examples of what a display looks
like when a font doesn't have a particular character.
What Unicode character would best represent the missing character symbol?
Apple's Last Resort font. :-)
Which I believe
Otto Stolz Otto dot Stolz at uni dash konstanz dot de wrote:
Or you could use a code-point that has no
character assigned to it (and is not likely to get one), e. g. U+03A2:
most systems will use their respective missing-character glyphs to
display it.
I like this last suggestion the best:
What Unicode character would best represent the missing character
symbol?
Apple's Last Resort font. :-)
Which I believe uses the various symbols shown at
http://www.unicode.org/charts/
so you can easily tell from which code range your font is missing the
character.
Have Last Resort
59 matches
Mail list logo