On 15 Sep 2013, at 22:52, Stephan Stiller wrote:
On 9/15/2013 1:04 PM, Doug Ewell wrote:
André Schappo wrote:
U+2026 is useful for microblogs when one is looking to save characters
Not if the microblog is in UTF-8, as almost all are.
That's an astute observation, but André was talking
Twitter - Until recently, characters outside the BMP resulted in a Counter
decrement of 2 and BMP characters gave a decrement of 1. Not sure when the change
happened but now both BMP non BMP characters result in a decrement of 1
Yes!! How might that have happened? ;-)
And the date line of
① Twitter - [...]
② Sina Weibo - [...]
About a year ago I blogged about it
http://schappo.blogspot.co.uk/2012/10/weibo-character-count.html
And your post on Twitter is this one:
http://schappo.blogspot.co.uk/2012/10/twitter-character-count.html
Stephan
2013/9/16 Stephan Stiller stephan.stil...@gmail.com
That's exactly what happens when people confuse code point with scalar
value ;-) Hmm, whom might we blame? :-)
Actually you never count scalar values. You are confusing tham with code
units. Twitter was orignally counting UTF-16 code units, but
You haven't been following the thread, have you. When you count code
points you can: either count the original code points, which is the
same as counting scalar values, /because that's what an encoding form
encodes/; or count code points corresponding to code units because,
well, you can match
* Philippe Verdy wrote:
2013/9/16 Stephan Stiller stephan.stil...@gmail.com
That's exactly what happens when people confuse code point with scalar
value ;-) Hmm, whom might we blame? :-)
Actually you never count scalar values. You are confusing tham with code
units. Twitter was orignally
On 9/16/2013 7:48 AM, Stephan Stiller wrote:
or count code points corresponding to code units because, well, you
can match them up
= or count code points corresponding to UTF-16 code units; those
happen to be BMP code points.
Twitter has been claiming since /at least/ April 2012 that they're
Nah!!! STRICTLY NOBODY counts scalar values.
Every one counts either
- (a) code units (most often 8-bit bytes, more rarely 16-bit bytes e.g.
with basic Javascript code), or
- (b) code points (independantly of code units used in the storage or
communication message format).
The application *may*
On 9/16/2013 1:41 PM, Doug Ewell wrote:
This has nothing to do with UTF-Anything or Normalization Form Anything.
But all with keeping the discussion alive for any reason, however
insignificant :)
A./
Oh, for heaven's sake:
Code Point. (1) Any value in the Unicode codespace; that is, the range
of integers from 0 to 10₁₆. (See definition D10 in Section 3.4,
Characters and Encoding.) Not all code points are assigned to encoded
characters. See code point type. (2) A value, or position, for a
Asmus Freytag asmusf at ix dot netcom dot com wrote:
On 9/16/2013 1:41 PM, Doug Ewell wrote:
This has nothing to do with UTF-Anything or Normalization Form
Anything.
But all with keeping the discussion alive for any reason, however
insignificant :)
I guess it was too soon to try to come
On 9/16/2013 2:18 PM, Doug Ewell wrote:
Asmus Freytag asmusf at ix dot netcom dot com wrote:
On 9/16/2013 1:41 PM, Doug Ewell wrote:
This has nothing to do with UTF-Anything or Normalization Form
Anything.
But all with keeping the discussion alive for any reason, however
insignificant :)
I
On 15 Sep 2013, at 02:32, Asmus Freytag asm...@ix.netcom.com wrote:
On 9/14/2013 12:19 PM, Michael Everson wrote:
And as a book designer and publisher, I think that having large spaces after
a full stop is both unnecessary and vulgar.
Quote from the blog:
This does not change my view.
On 9/14/2013 6:24 AM, Michael Everson wrote:
It facilitates comment by those who are reviewing the text.
If you add proofreaders' marks to an especially difficult manuscript,
maybe. I've barely seen annotated papers with comments that would not
have fit into the margins, and there's still the
On 13 Sep 2013, at 20:02, Whistler, Ken wrote:
The *interesting* question, in my opinion, is why folks feel impelled to use
U+2026 to render a baseline ellipsis in Latin typography at all, rather than
just using U+002E ad libitum...
--Ken
U+2026 is useful for microblogs when one is looking to
Do you mean saving two characters for posting to Tweeter ? Well may be, but
Tweeter clearly does not promote correct typography and not even correct
orthography. It is clearly not a good model for publishing.
But given the history of this character, I just wonder why it was not
mapped along with
Andre Schappo wrote:
U+2026 is useful for microblogs when one is looking to save characters
Not if the microblog is in UTF-8, as almost all are.
--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell
On 9/15/2013 1:04 PM, Doug Ewell wrote:
André Schappo wrote:
U+2026 is useful for microblogs when one is looking to save characters
Not if the microblog is in UTF-8, as almost all are.
That's an astute observation, but André was talking about input limits
Not if the limit is counted in characters and not in bytes. Twitter, for
example, counts code points in the NFC representation of a tweet.
Doug Ewell d...@ewellic.org wrote:
Andre Schappo wrote:
U+2026 is useful for microblogs when one is looking to save characters
Not if the microblog is in
On 9/15/2013 3:07 PM, Phillips, Addison wrote:
Not if the limit is counted in characters and not in bytes. Twitter,
for example, counts code points in the NFC representation of a tweet.
character, code point – these are confusing words :-)
From the link it isn't entirely clear whether they
(a)
Actually, that's my bad: I meant to type scalar value.
Stephan Stiller stephan.stil...@gmail.com wrote:
On 9/15/2013 3:07 PM, Phillips, Addison wrote:
Not if the limit is counted in characters and not in bytes. Twitter, for
example, counts code points in the NFC representation of a tweet.
Addison Phillips wrote:
Not if the limit is counted in characters and not in bytes. Twitter,
for example, counts code points in the NFC representation of a tweet.
You're right. I take that back, about Twitter at least.
Stephan Stiller wrote:
From the link it isn't entirely clear whether
On Sun, Sep 15, 2013 at 09:21:47PM +0200, Philippe Verdy wrote:
If there's something to do now (given it is no longer used in CJK
contexts), it's to strongly recommand that fonts map them to exactly the
same glyph as the one obtained by aligning three periods in a raw without
any additional
Stephan Stiller wrote:
From the link it isn't entirely clear whether they
(a) count scalar values of NFC or
(b) count code points of NFC.
Are they not the same thing, except for surrogates?
Conceptually no, but numerically yes – you are right in that regard, and
I wasn't precise in my
Doug wrote me:
You're not confusing code point with code unit, are you?
Thanks for the note.
I think what you say is that I thought (or meant to write) by first
representing the sequence of scalar values in an encoding form and then
counting [code points typecast from] code _units_. I think
2013/9/14 Stephan Stiller stephan.stil...@gmail.com
This tradition is persistant.
Persistent where?
This is already replied within my message you quote here.
Lots of people
Lots of people who
Same remark.
So there are many contributors, on the English Wikipedia. What does
You've quoted the sentence out of its context (note the then word
which indicates this context). I do not support this practice.
Philippe, within my message you quote here isn't exactly precise about
context, is it :-)
I think there's a misunderstanding. My annoyance isn't in principle with
On 14 Sep 2013, at 02:30, Stephan Stiller stephan.stil...@gmail.com wrote:
This means that this dot will then need to be followed by two spaces when it
is used as a sentence-ending period.
This tradition is no longer current in the US. Though it's obvious there are
still plenty of middle
Quote/Cytat - Michael Everson ever...@evertype.com (Sat 14 Sep 2013
12:42:50 PM CEST):
On 14 Sep 2013, at 02:30, Stephan Stiller stephan.stil...@gmail.com wrote:
This means that this dot will then need to be followed by two
spaces when it is used as a sentence-ending period.
This
[ME:]
Books never used it. The tradition in typing was developed to assist
typesetters to navigate the typewritten text they were setting. The
typesetters never put two spaces after a full stop.
I'm looking at what looks like a US edition/printing (1902) of the
US-American novel Moby-Dick:
On 9/14/2013 3:42 AM, Michael Everson wrote:
On 14 Sep 2013, at 02:30, Stephan Stiller stephan.stil...@gmail.com wrote:
This means that this dot will then need to be followed by two spaces when it is
used as a sentence-ending period.
This tradition is no longer current in the US. Though it's
On 14 Sep 2013, at 14:16, Stephan Stiller stephan.stil...@gmail.com wrote:
Books never used it. The tradition in typing was developed to assist
typesetters to navigate the typewritten text they were setting. The
typesetters never put two spaces after a full stop.
I see. I think you were
On 14/09/2013 6:42, Michael Everson wrote:
On 14 Sep 2013, at 02:30, Stephan Stiller stephan.stil...@gmail.com wrote:
This means that this dot will then need to be followed by two spaces when it is
used as a sentence-ending period.
This tradition is no longer current in the US. Though it's
On 14 Sep 2013, at 19:11, Jim Allan jallan...@rogers.com wrote:
See http://www.heracliteanriver.com/?p=324 which claims with numerous
examples that Michael Everson is totally wrong.
It's what I was taught.
Michael Everson * http://www.evertype.com/
On 9/14/2013 6:24 AM, Michael Everson wrote:
On 14 Sep 2013, at 14:16, Stephan Stiller stephan.stil...@gmail.com wrote:
Books never used it. The tradition in typing was developed to assist
typesetters to navigate the typewritten text they were setting. The typesetters
never put two spaces
And, FWIW, so also was I taught, in a typing class in 1952.
Peter
On 2013-09-14 14:44, Michael Everson wrote:
On 14 Sep 2013, at 19:11, Jim Allan jallan...@rogers.com wrote:
See http://www.heracliteanriver.com/?p=324 which claims with numerous examples
that Michael Everson is totally
And as a book designer and publisher, I think that having large spaces after a
full stop is both unnecessary and vulgar.
On 14 Sep 2013, at 20:12, Peter Zilahy Ingerman, PhD p...@ingerman.org
wrote:
And, FWIW, so also was I taught, in a typing class in 1952.
Peter
On 2013-09-14 14:44,
On Sat, Sep 14, 2013 at 08:19:54PM +0100, Michael Everson wrote:
And as a book designer and publisher, I think that having large spaces after
a full stop is both unnecessary and vulgar.
As a book consumer, I know that having somewhat larger space after
end-of-sentence is a MUST (at least for
But this article is excellent. Even if it also contains opinions of the
author about aberrant French practices, some of them are still prevalent
such as the persistant use of an extra spacing before colon, semi-colon,
exclamation and question marks, and within guillemets:
- the practice is still
For sure, large spaces will look ugly with texts written with very short
sentences like yours, because that will create ugly rivers everywhere.
Consider that it is a matter of style, which must be adapted to the nature
of texts and author's own lingusitic style.
- If you had to typeset the Bible
Reviewing hardcopy is still a very common practice when preparing drafts
for discussions in meetings. Even the UTC meetings may want draft documents
prepared with wide line spacing to facilitate the annotations duing
discussions.
This will help the review, simply because it is faster to anotate a
On 9/14/2013 12:19 PM, Michael Everson wrote:
And as a book designer and publisher, I think that having large spaces after a
full stop is both unnecessary and vulgar.
Quote from the blog:
While the modern convention is the single space, it is no less
arbitrary than any other, and if
2013/9/15 Asmus Freytag asm...@ix.netcom.com
On 9/14/2013 1:24 PM, Philippe Verdy wrote:
Reviewing hardcopy is still a very common practice when preparing drafts
for discussions in meetings. Even the UTC meetings may want draft documents
prepared with wide line spacing to facilitate the
On 9/13/2013 10:54 AM, Whistler, Ken wrote:
Stephan Stiller noted:
Maybe ... and the origin of the single-glyph ellipsis remains a mystery
to me.
As Philippe surmised, it is a compatibility character, originally included
in the Unicode 1.0 repertoire for cross-mapping to existing legacy
Stephan Stiller noted:
Maybe ... and the origin of the single-glyph ellipsis remains a mystery
to me.
As Philippe surmised, it is a compatibility character, originally included
in the Unicode 1.0 repertoire for cross-mapping to existing legacy
encodings:
Code Page 932: 0x81 0x64
Code Page
I wrote:
As Philippe surmised, it is a compatibility character, originally included
in the Unicode 1.0 repertoire for cross-mapping to existing legacy
encodings:
Code Page 932: 0x81 0x64
Code Page 949: 0xA1 0xA6
Asmus responded:
which just pushes that question forward in time...
2013-09-13 22:02, Whistler, Ken wrote:
The *interesting* question, in my opinion, is why folks feel impelled to use
U+2026 to render a baseline ellipsis in Latin typography at all, rather than
just using U+002E ad libitum...
In traditional typography, an ellipsis usually has dots set apart
Exactly my thoughts:
In fonts commonly used for word processing and desktop publishing,
HORIZONTAL ELLIPSIS is usually not that well designed.
To me the dots appear too close in plenty of fonts.
But I think that the most common cause of the appearance of HORIZONTAL
ELLIPSIS is that Microsoft
2013/9/13 Jukka K. Korpela jkorp...@cs.tut.fi
2013-09-13 22:02, Whistler, Ken wrote:
The *interesting* question, in my opinion, is why folks feel impelled to
use
U+2026 to render a baseline ellipsis in Latin typography at all, rather
than
just using U+002E ad libitum...
In traditional
Hi Philippe,
This means that this dot will then need to be followed by two spaces
when it is used as a sentence-ending period.
This tradition is no longer current in the US. Though it's obvious there
are still plenty of middle and high school–level teachers and
college-level writing
Lots of people still do this. I did until a year or two ago.
--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell
-Original Message-
From: Stephan Stiller stephan.stil...@gmail.com
Sent: 9/13/2013 19:30
To: unicode@unicode.org unicode@unicode.org
Subject: Re: Origin
This tradition is persistant. You can still easily see many contributors in
English Wikipedia that are continuously autocorrecting wiki pages to
insert these double spaces after periods, with comments like fix
typography... These edits are massive, sometimes performed by bots.
Well I will never
:-)
Lots of people still do this. I did until a year or two ago.
I also use non-standard punctuation, but I tend to know what majority
practice is, and when I deviate it's intentional. I don't know about
you, but nearly everyone who tells me that you should use two spaces
(should? says who?)
This tradition is persistant.
Persistent where?
Lots of people
Lots of people who and how many?
Go to a bookstore or library, pick 100 items randomly, and report. If
you want to make a case that it's majority or significant usage in
personal correspondence or outside of professional
2013/9/14 Stephan Stiller stephan.stil...@gmail.com
This tradition is persistant.
Persistent where?
This is already replied within my message you quote here. Just look forward.
Lots of people
Lots of people who
Same remark.
and how many?
Go to a bookstore or library, pick 100
This tradition is persistant.
Persistent where?
This is already replied within my message you quote here.
Lots of people
Lots of people who
Same remark.
So there are many contributors, on the English Wikipedia. What does
many mean? I doubt double spacing of sentences is
56 matches
Mail list logo