On 14/09/2013 6:42, Michael Everson wrote:
On 14 Sep 2013, at 02:30, Stephan Stiller wrote:
This means that this dot will then need to be followed by two spaces when it is
used as a sentence-ending period.
This tradition is no longer current in the US. Though it's obvious there are
still pl
positional variants.
Jim Allan
symbol and the colon sign (U+20A1) â are identified.
Jim Allan
Ì
ew of current users or
expected users.
Unicode should do what is most useful.
Honest debate does arise, because what is useful in one sphere or from
one point of view may cause problems in another sphere or from another
point of view. Sometimes there is no definite correct answer.
Jim Allan
acters with which it would be reasonable
to unify them) were *never* used before 1923 in *any* published work,
attempting to prove a negative.
Why prescribe a closed subset?
Jim Allan
* exactly when a character was invented.
Characters adopted into new standardized new Latin-based alphabets or
new standardized Latin-based transcription systems often have a previous
history.
Jim Allan
some scholars have fought over
rather futilely as to whether it is written in Phoenician or Hebrew or
some other related language/dialect.
See
http://lists.ibiblio.org/pipermail/b-hebrew/2000-February/006723.html
for a summary that I think fits recent tendencies to let such matters lie.
Jim
istopher Tolkien's editing of his father J.R.R.
Tolkien's unpublished papers) should one quote by spelling and display
the Unicode EZH character or silently substitute the Unicode YOGH character?
There is no obviously right answer for all cases or even for many
individual cases.
When in doubt use the number three. ;-)
Jim Allan
ters
that most consider standard modern Hebrew letters.
Jim Allan
tion selector as well as ZWJ and variant spaces already encoded.
Jim Allan
sual distinction count when it is the *sole*
difference? It appears to me that this is where dispute lies mostly,
despite the precedent of the Unicode encoding of runic "scripts".
There may also be some thinking of HTML/XML/XHTML web display of
characters where forcing of font is not reliable. One would not want a
discussion of ancient Phoenician characters to display modern Hebrew
forms! But this same problem currently applies to runes, medieval Latin
characters, Han characters and so forth. One shouldn't let the current
shortcomings of one display method among many dictate Unicode encodings.
Jim Allan
ough markup when required, just as one
does Chinese and Japanese or Uncial scripts and Roman scripts or various
Runic styles.
Jim Allan
original EBCDIC in 1964. See
http://homepages.cwi.nl/~dik/english/codes/stand.html#ebcdic
It appeared on IBM mainframe terminal keyboards. It still appears on
terminals in an EBCDIC environment.
Jim Allan
ed apostrophe because the
terminal font glyph for ASCII apostrophe in the original PC DOS terminal
character sets was an angled glyph (which was correct for the ASCII of
that time). Indeed the angled apostrophe still appears in DOS fonts.
Jim Allan
where the two symbols are normally equated. For example, from
http://www.printek.com/products/autoforms.html
<< The following commands use the logical not ( ) sign or a caret (^).
IBM terminals generally have the logical not sign. PC's running a
terminal emulation program have a caret. In either case, both characters
are a shift 6 on the keyboard. >>
Jim Allan
names with obvious errors corrected were to be created.
Jim Allan
ld be
more free than otherwise to render the uppercase glottal stop to match
more closely other uppercase characters in a particular lettering style.
Jim Allan
t it is up to the user to determine the specialized uses. >>
See also 5.5 for further discussion.
Jim Allan
binary format.
Checking for decimal numbers is also useful in parsing addresses which
is a necessity for address validation and address correction software.
Jim Allan
uot;'number positional'" for digits in any radix, it might be useful
to add a property "possible positional digit for any radix up to 36" for
the normal ASCII digits and the uppercase and lowercase characters of
the normal twenty-six letters in the ASCII character set.
But is this generally useful enough to warrant it being part of the
Unicode specification?
And is that not also culturally biased? But then all scripts are to
some degree bound to a particular culture or to particular cultures.
Jim Allan
then
enter the Unicode character that looks like the ligature regardless of
its alphabetic status in the language.
Jim Allan
cter like ASCII 0x01 (probably in a data file in error or
by mistransliteration) will be rendered by a printer as a missing glyph
symbol or space seems to depend on which printer driver is used and
often on settings within that driver.
Jim Allan
y a special glyph with the
meaning "character not supported".
Jim Allan
Please see
http://www.microsoft.com/typography/otspec/recom.htm
... the section about "Shape of .notdef glyph"
Best regards,
James Kass
.
[EMAIL PROTECTED] wrote:
Now as I review this thread (and find one of my very own typos),
I wonder if Jim Allan and I are "on the same page" when we
speak of "missing glyph"? It means something very specific
in the font jargon.
I understand "missing glyph&qu
s any such requirement.
Jim Allan
uld say that Unicode does not encode
separately scripts or systems intended solely as transliterations of
other scripts. Ciphers are a common example of such scripts and systems.
Jim Allan
Philippe Verdy wrote:
From: "Jim Allan" <[EMAIL PROTECTED]>
But if, in your opinion Theban and 3of9 bar code is on the cipher side
of a line and Ewellic is on the other side I would like to know the
logic on which this line is drawn.
With that definition,
BarCodes are
Peter Kirk wrote:
On 12/11/2003 12:55, Jim Allan wrote:
...
As far as I can see, for example, Unifon people favor an ASCII cipher
encoding over the conscript registry coding of Unifon. An IPA-based
cipher encoding might be better.
Many biblical Hebrew users still prefer one of many ASCII or
favor an ASCII cipher
encoding over the conscript registry coding of Unifon. An IPA-based
cipher encoding might be better.
Jim Allan
Michael Everson posted in answer to Philippe Verdy's query "Is Ewellic a
script?":
Of course it is.
IPA is a subset of the Latin script.
Accordingly both Ewellic and Theban could be treated as ciphers of
subsets of the Latin script.
Jim Allan
that look identical
to human beings but have radically different meanings. Unicode as enough
of those by necessity and for backward compatibility.
Jim Allan
ters.
But if, as in current computer languages, there is a marker to tell
whether a number is decimal or hex then you don't need separate coding
for the letters representing digits. The marker indicates whether the
number is decimal or hex.
The proposal also ignores the fact that lower case letters are indeed
often used in hex representation.
Jim Allan
a shape.
It is really only with _s_ that there are two conflicting usages.
There are actually three conflicting uses, since Gagauz traditionally
uses a cedilla shape under _c_ an undercomma beneath _t_ and a symbol
halfway between the two under _s_. See
http://www.unicode.org/mail-arch/unicode-ml/y2002-m09/0199.html
Jim Allan
folding doesn't matter.
I am talking about searches not replacements. For global or regional
replacement of one by the other you don't want any folding to occur.
Jim Allan
particular application.
Jim Allan
following combining comma below.
I wonder if there's call for some sort of table of Unicode sequences
that aren't canonically equivalent but render the same.
It seems to me that Cedilla/undercomma folding would be a useful
addition to "Charater Foldings" at http://www.unicode.or
ould not in all such cases be turned.
For example turning U+031E COMBINING DOWN TACK BELOW if placing it above
a base character instead would turn it into its opposite in appearance,
into U+031D COMBINING UP TACK BELOW.
Jim Allan
adjustments are required only
between adjacent characters in canonical order, despite what the
"interact typographically" rule might suggest.
Therefore the standard should so indicate. Currently it really says the
opposite, though that obviously isn't right and not what is intended.
Jim Allan
7;t altogether sure about the
meaning of some of the diacritics in the citation.
To have combining characters in generally changing position depending on
the font doesn't seem to me to be desirable, especially in technical
work where the position of the diacritic is sometimes as important as
its shape.
Jim Allan
AA1 DOUBLE
NESTED LESS-THAN and U+2AA2 DOUBLE NESTED GREATER-THAN.
Jim Allan
Kent Karlson wrote:
Jim Allan wrote:
...
One may note the common use of the greater-than and less-than signs as
angle brackets in many publications
Just because < and > are in ASCII, the have been used as approximations.
That was the origin of this practice.
However the practice is found
appearance are not usually considered to be style variants that should
be selected by changing a font.
Jim Allan
it isn't the right character.
I'm not at all sure what "general-purpose corner brackets" are.
Jim Allan
The code chart menu page at http://www.unicode.org/charts/ does not
contain a link to the Ugaritic characters
However the Ugaritic chart exists and can be obtained by using the
direct url http://www.unicode.org/charts/PDF/U10380.pdf.
Jim Allan
individual effect on a single routine.
To write routines that depend on properties that Unicode has announced
as changeable may be bad coding. But I don't see that applications in
the future will be any less afflicted with bad coding than current
applications.
Jim Allan
digits in numbers. >>
Jim Allan
at did not make them
white space.
Of course under Unicode specifications NBSP is expect to expand like
SPACE for justification and so assumes some of the attributes of SPACE.
For compatility I think it best to not include any of the non-breaking
spaces as white space.
Jim Allan
James P Cowrie posted:
the sign used for aleph (looks like a 3, but isn't, obviously)
Actually the sign similar to 3 is used for `ain, `ayan, not aleph.
Normally U+02BE is used for aleph, though sometimes slightly extended.
3_ as that is the traditional fallback used in transliterations when
the proper character is not available.
Jim Allan
a
lot of mainframe systems still using EBCDIC encodings.
Jim Allan
Michael Everson wrote:
The Last
Resort Font has glyphs for all the characters, so it's the last one
looked at.
I hope that it is not just for that reason that it is the last one
looked at.
Jim Allan
by East Asian fonts
or top-of-the line publishing software that handles east Indian scripts
impeccably.
Government software for various governments may purposely support only a
particular subset of the Unicode character set.
Jim Allan
odes (and even some
of the non-deprecatated control codes) and does not support particular
characters (perhaps only because there are no fonts on the system with
those characters) can still be conformant to Unicode in what it supports.
A text editor that supports only fixed width fonts will probably not
support the special-width spaces properly but may still be Unicode
conformant.
Jim Allan
ty in unifyng compatibility
characters for presentation.
If it is not deprecated a character should be usable.
But some more obivous graphic indication would be nice to more obviously
indicate that perhaps a user should think carefully about using that
particular encoded character.
Jim Allan
r closed or open loop. But a font for
phonetic use should always display U+0067 with a closed loop.
Fonts like Arial Unicode MS lose the distinction.
For non-technical use people need not and mostly quite rightly will not
use the more technical symbols to make fine distinctions that don't
apply in their particular usage.
Jim Allan
ard
value for this character in a particular font will display properly.
The character U+212A within Unicode is useless.
Maybe it is time to deprecate some of these characters.
Jim Allan
t at
http://www.gov.nu.ca/font.htm and
http://www.gov.nu.ca/Nunavut/English/font.
(The small capitals are missing from the light, bold, and heavy fonts in
the Pigiarniq family.)
These fonts are attributed to Tiro Typeworks, so perhaps John Hudson can
explain this lucky happenstance.
Jim Allan
ubscripted base indicator or a leading "&H"
or the word "hex" or some other indicator of meaning is far more useful
to humans than a double encoding of the same characters according to
meaning.
If you can't normally see the difference in text then Unicode normally
shouldn't encode any difference.
Jim Allan
e a spelling confusion with traditional coding if the holam-vav
method is introduced. Introducing yet another way of writing either holam male
or the holam to specially indicate a center dot holam male creates further
differences in spelling without appreciable benefit.
Jim Allan
critic such as a
single overdot as exact formatting behavior is not defined in such cases.
Jim Allan
text.
The characters, small capital or others, are displayed with no problems.
Jim Allan
and should
presumably change the width of NBSP when appropriate.
Such changes of width and shapes are what one finds with ligatures in
fonts that support ligatures.
Jim Allan
n any case, I see nothing in the Unicode specifications that suggests
replacing either U+0020 or U+00A0 by U+20CC when followed by a combining
character or placing applying the combining character to any inserted
U+20CC when it is part of a defective combining character sequence.
Jim Allan
_D15
zero width no-break space
U+20620 WORD JOINER should be used instead of U+FEFF if one's font and
application supports it.
Jim Allan
time still customary to spell _þe_ as
_ye_ instead of _the_.
As to _gh_, corresponding Middle English words normally used the letter
yogh (_ȝ_). The difference is in spelling. Both spellings are available
through Unicode.
Jim Allan
h two different pronunciations then
I would expect Unicode to encode this, especially if the the distinction
for forms were found to have been practised for over a thousand years
and to still be observed in careful typography today.
Jim Allan
Peter Kirk posted:
... if we are to encode separately the dot in holam male, what would
you call that dot? We can't call it holam male because that is the name
of the combined vav and holam.
Would HEBREW POINT HOLAM MALE INDICATOR do?
Jim Allan
diacritics, ask if you can check the labels and
name/address blocks on some non-personal mail they receive.
Jim Allan
.
This case is very similar to the Hebrew case in that in both we have a
typographical variation which indicates a pronuciation difference, but
this typographical difference is not noticed by many native speakers of
the language even though they read texts that observe the difference.
Jim Allan
is considered acceptable
uppercase with diacrtics is still considered *more* correct.
Jim Allan
cluded as part of the character set.
Jim Allan
of data.
I suppose if one were translating to Unicode and came across this
radicalex followed by a character X one could replace it by U+00A0
NON-BREAKING SPACE followed by X followed by U+0305 COMBINING OVERLINE.
Jim Allan
creators of fonts.
Jim Allan
&_ as a letter.
The author of the article explained this by noting that _&_ was used
occasionally in manuscripts to spell _et_ in Icelandic words.
Jim Allan
ligature (which is really
representing the French word "et" with its two letters) is quite
common even in recent books and publications, and it looks
pretty good typographically, notably for its titlecase version at
at the beginning of sentences.
Possibly a capital ampersand is needed?
Jim Allan
or right-justification and hide them when they would
otherwise appear at column right position.
Jim Allan
François Yergeau posted:
Jim Allan wrote:
U+202F which is always a wide space would be generally less
desireable than ordinary non-breaking U+00A0.
Didn't you confuse U+2007 and U+202F here? U+202F is the *NARROW* NBSP.
Yes. I certainly did pasted in the wrong Unicode value. It is U+2007
ing U+00A0.
Jim Allan
enerally be welcome. >>
For discussion of when double spacing after a period might still be good
practice see
http://desktoppub.about.com/library/weekly/blrules-spaces.htm and
http://www.evolt.org/article/Two_Spaces_After_a_Period_Isn_t_Dead_Yet/25/213/?format=print.
Jim Allan
or leading spaces for numbers in columns
(sometimes along with U+2008 PUNCTUATION SPACE) to enable right
justification of numbers in such columns.
Jim Allan
e
no break is allowed, not U+2009 THIN SPACE or any other spacing character.
Jim Allan
vides
identical glyphs that represent characters with very different
properties such as "!" for punctuation and "!" for a Zulu click in the
hope, probably vain, that people in general will recognize the difference.
Jim Allan
s not improper in any language that make use of these characters
to simply choose to always use the forms with haceks. >>
This would also avoid the oddity of suggesting that the languages
themselves may choose.
Of course no matter what one says about what is proper or what is
"preferred", someone is likely to be found who will dispute it.
Jim Allan
valid use of "special" in some particular context,
from http://www.unicode.org/mail-arch/unicode-ml/y2002-m11/0575.html:
Would that change the
above quoted non-need for special MORSE DOT and MORSE DASH characters?
Jim Allan
dictionary.oed.com if one had an OED on-line subscription.
Jim Allan
access to
the site through an institution that has such a subscription and could
help Asmus out by looking up the OED use of these symbols or its
citations of sources in which the symbols occur.
Jim Allan
ith a bar
through the descender (with corresponding uppercase) to indicate the
both medieval _per_ sign and modern phonetic ussage of barred _p_ and a
separate character for the more modern swirly descendant of the medieval
_per_ sign.
Jim Allan
the proper
characters are available in Unicode editors may wish to substitute more
familiar variations for ease of readibility.
Jim Allan
heet. >>
That seems to be facts of the matter.
The symbol is one, and to be encoded as U+2205 pending indication that
distinctions have been generally made between the glyphs or new
standards requesting that in the future a distinction be made between
the glyphs.
Jim Allan
recommendation is quite clear about the tentative nature of its
presentation.
Jim Allan
21 February 2001 >>
I would think that anyone can properly make up their own
variation-selector sequences for anything in a recomendation.
I have no idea what has since happened in respect to this recommendation.
Jim Allan
the forseeable
future.
Jim Allan
.
Jim Allan
pt show them as they find them.)
A wordprocessing or desktop publishing application could use the forms
and sizes of the dots in these characters in the current font as the
basis for creating its own leaders (going instead to the full stop if
these characters are empty).
Jim Allan
OT LEADER?
Are there any other characters in Unicode that are *expected* to stretch
in size and produce multiple images?
Jim Allan
sume meaming "empty variant") is applied to slashed
circle would seem to indicate that to the creators of mathml, as well as
to Donald Knuth, the slashed zero form is felt to be the more normal
glyph for empty set (and for other indications of emptiness, nullity, etc.)
Jim Allan
iant of the round
empty set symbol through a variation selector ... if it is *asked for*.
But that is for those who use such notation regularly to decide.
But I doubt you will find any linguist who would consider the Norwegian
capital slashed O as anything other than a kludge replacement for
either the standard round empty set symbol or the slashed zero symbol.
Jim Allan
a list of characters at
http://mercury.ccil.org/~cowan/elsie/elsie.txt.
Jim Allan
a right hook in some other standard
and remained when the other hook was withdrawn from consideration.
Jim Allan
#x27;lower case nasal
"a"' mentioned in the text can be seen on the charts as _a_ with the hook.
This fits a normal convention in American linguistics to use ogonek to
signify a nasal.
Jim Allan
1 - 100 of 121 matches
Mail list logo