Re: In defense of Plane 14 language tags

2002-11-04 Thread Doug Ewell
My paper on Plane 14 language tags is now available in PDF format: http://home.adelphia.net/~dewell/Plane14.pdf Thanks to everyone who has commented so far. -Doug Ewell Fullerton, California

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Doug Ewell
Joseph Boyle wrote: > Newline problems are a good analogy. They still require bookkeeping of > different formats and attention in any new coding and cause new bugs, > even though the problem has been around for decades. Nobody is holding > their breath for any of the platforms to change their new

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Tex Texin
Joseph Boyle wrote: > > Yes, the software business is largely about dealing with the BADLY WRITTEN, > the TRIVIAL, and the BRAIN-DEAD. Your point? I see we are still working on naming utf-8 formats with and without the bom. I find these quite acceptable, assuming you mean: utf8-badly-written-

RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Joseph Boyle
Yes, the software business is largely about dealing with the BADLY WRITTEN, the TRIVIAL, and the BRAIN-DEAD. Your point? Newline problems are a good analogy. They still require bookkeeping of different formats and attention in any new coding and cause new bugs, even though the problem has been aro

Re: In defense of Plane 14 language tags (long)

2002-11-04 Thread Michael Everson
At 22:25 + 2002-11-04, Thomas M. Widmann wrote: Or what about Coptic? Unicode encodes most Coptic letters as Greek, which means that the same font cannot be used for displaying Greek and Coptic. (TUC 3.0, p. 168: "Texts that mix Greek and Coptic languages together must employ appropriate fo

Re: In defense of Plane 14 language tags (long)

2002-11-04 Thread David Starner
On Mon, Nov 04, 2002 at 10:25:18PM +, Thomas M. Widmann wrote: > Or what about Coptic? Unicode encodes most Coptic letters as Greek, > which means that the same font cannot be used for displaying Greek and > Coptic. This is going to change, though. See the top two papers here

Re: [OT] Re: `` ", ` '

2002-11-04 Thread John Cowan
Mark Davis scripsit: > A good algorithm has to be even smarter, so that, for example, > ...--("this")--... works. Yes, a left context of ( seems to force an initial quote also, and likewise with [, but < does not; the sequence <" generates a right quote, at least in Word 97. -- Business before

Re: In defense of Plane 14 language tags (long)

2002-11-04 Thread Thomas M. Widmann
"Doug Ewell" <[EMAIL PROTECTED]> writes: > OK, so my "mini-essay" against deprecating the Plan 14 language tags > didn't turn out quite so "mini" after all. It was very interesting. > [...] > Other scripts besides Han can benefit from plain-text language tagging > as well. A common Latin-script

Re: ct, fj and blackletter ligatures

2002-11-04 Thread Peter_Constable
On 11/04/2002 06:11:35 AM Thomas Lotze wrote: >So far the theory is very clear, and as far as plain text is concerned, >seems to be directly applicable. However, if I have a typeset document, >say in PDF format... If you've got a PDF document, it is encoded entirely in terms of glyphs. There is

Re: History of character codes

2002-11-04 Thread David Starner
On Sat, Mar 02, 2002 at 03:49:46PM -0800, Doug Ewell wrote: > Be very, very skeptical of anything you read in the TRON article. It is heavily >biased against Unicode and anything perceived as American in origin, and makes some >false and misleading statements about Unicode. For example, it stat

Re: In defense of Plane 14 language tags (long)

2002-11-04 Thread Otto Stolz
John Hudson wrote: the OpenType 'language system' tags are better understood as typographic system tags, and it is not clear to me that it would always be possible or desirable to link a particular OT typographic tag to a particular Plane 14 language tag -- or, indeed, to any language t

Re: In defense of Plane 14 language tags (long)

2002-11-04 Thread John Hudson
At 10:21 11/4/2002, Otto Stolz wrote: A common Cyrillic example is the difference in the italic forms for, e. g., Russian and Serbian, cf. "Rendering Serbian italics" (used to be at -- John, can we have it back?). It's back. I have not read

Re: [OT] Re: `` ", ` '

2002-11-04 Thread Michael Everson
At 10:08 -0800 2002-11-04, Mark Davis wrote: A good algorithm has to be even smarter, so that, for example, ...--("this")--... works. Do you *use* commercial software, Mark? -- Michael Everson * * Everson Typography * * http://www.evertype.com

Re: [OT] Re: `` ", ` '

2002-11-04 Thread Mark Davis
A good algorithm has to be even smarter, so that, for example, ...--("this")--... works. Mark __ http://www.macchiato.com ► “Eppur si muove” ◄ - Original Message - From: "John Cowan" <[EMAIL PROTECTED]> To: "Doug Ewell" <[EMAIL PROTECTED]> Cc: "Unicode Mai

Re: In defense of Plane 14 language tags (long)

2002-11-04 Thread Otto Stolz
Dominikus Scherkl wrote: I found the arguments quite convincing So do I. But I think, they should be stated as clearly, and conclusively, as possible. Thence my recent comments. Best wishes, Otto Stolz

Re: In defense of Plane 14 language tags (long)

2002-11-04 Thread David Starner
On Mon, Nov 04, 2002 at 06:45:26PM +0100, Dominikus Scherkl wrote: > I found the arguments quite convincing - why deprecate the tags? > Noone has till now brought an argument to deprecate them... Because it's been a long standing discussion on this list. The argument against them is that they're s

/ as comma

2002-11-04 Thread Stefan Persson
> First, the ` is not a quote mark: it is a grave accent/ Second, it > also doesn/t say that you can/t use a slash/ say/ instead of a comma/ > apostrophe/ or period/ But that doesn/t mean it/s a good idea/ Using slash as comma is not as bad an idea as you think; in fact it *was* used as comma in

RE: In defense of Plane 14 language tags (long)

2002-11-04 Thread Dominikus Scherkl
Hi. > > 1. Language tags may be useful for display issues. > The "user" viewing the text (and preferring 'Japanese-style' glyphs) > may be another person than the "user" authoring the text Hrrr. It's quite clear, that only the author has inserted the tags - thus the text will appear to any other

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]> > No, the notation to say "BOM required (report any files without BOM)", "BOM > not allowed (report any files with BOM)", or "BOM optional (only report > files if they are not valid UTF-8 at all)", for a given file type. Well, yes. If you wanted to avoid m

RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Joseph Boyle
>> Yes, it's trivial to check. What's missing is the notation to tell the >> checker what to check for. >Sorry, but that is incorrect. If they know its UTF-8, then its either a BOM or its not. It is three specific bytes. No, the notation to say "BOM required (report any files without BOM)", "BO

Re: In defense of Plane 14 language tags (long)

2002-11-04 Thread Otto Stolz
Doug Ewell wrote: 1. Language tags may be useful for display issues. ... For example, it is often said that Japanese users prefer “Japanese-style” glyphs universally, even for Chinese text. The Plane 14 tagging approach is not perfect, but it is sufficient to solve this problem. Japanese us

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Edward H Trager
Hi, everyone, It's almost unbelievable to me how many email postings are wasted on discussions such as this UTF-8 BOM issue ... I guess it means that there is a lot of BADLY WRITTEN software out there in the world ;-) With regard to READING incoming UTF-8 text streams, surely any good software d

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread John Cowan
Joseph Boyle scripsit: > I haven't encountered UTF-32, SCSU, UTF-7, or BOCU-1 as transfer encodings. Alas, a member of one of the mailing lists I'm on is using an old version of Netscape, and he ends up sending UTF-7 (unless he is very careful not to) whenever he does anything non-ASCII. The tro

Re: [OT] Re: `` ", ` '

2002-11-04 Thread John Cowan
Doug Ewell scripsit: > Plus, as Michael said, it's not that hard to write software that keeps > track of matching U+0022's and converts them to U+201C and U+201D as > appropriate (and likewise for single quotes). AFAIK it's not matching that is used, but whether there's whitespace to the left (in

RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Joseph Boyle
>INCOMING TEXT: Trivial to simply chek. I say (once again) its THERE BYTES. If hey are there then there is a BOM. Simple. Yes, it's trivial to check. What's missing is the notation to tell the checker what to check for. >> The inability to update to one standard all possible consuming >> softw

RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Joseph Boyle
I haven't encountered UTF-32, SCSU, UTF-7, or BOCU-1 as transfer encodings. If so, they potentially have the same BOM/signature question, unless all uses are established as BOM or agnostic, or non-BOM and agnostic. I do not expect it to come up much as the formats/protocols that insist on non-BOM g

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]> > Yes, it's trivial to check. What's missing is the notation to tell the > checker what to check for. Sorry, but that is incorrect. If they know its UTF-8, then its either a BOM or its not. It is three specific bytes. > Yes, this is a good description of

[OT] Re: `` ", ` '

2002-11-04 Thread Doug Ewell
John Delacour wrote: >> It's a terrible idea. I hate ``this quoting convention" (or >> alternatively ``this one'') as much as anyone. > > It's a very good idea if you consider that it enables us-ascii text > to be easily converted to nicely formatted text containing curly > quotes pointing in th

Re: Header Reply-To

2002-11-04 Thread Mark Davis
You were caused by the ASCII standard? An interesting family history you must have. Do tell! Mark __ http://www.macchiato.com ► “Eppur si muove” ◄ - Original Message - From: "Michael Everson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, Novemb

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Doug Ewell
Joseph Boyle wrote: > Software currently under development could use the identifiers for > choosing whether to require or emit BOM, like the file requirements > checker I have to write, and ICU/uconv. Alternatively, software could use a completely separate flag to indicate whether a BOM is to be

Re: Header Reply-To

2002-11-04 Thread Michael Everson
At 07:21 -0800 2002-11-04, Mark Davis wrote: I don't think that usage is described in the ASCII standard; as far as I can tell it is only in that RFC. I was *caused* by the ASCII standard surely. -- Michael Everson * * Everson Typography * * http://www.evertype.com

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]> Joesph, > Software currently under development could use the identifiers for choosing > whether to require or emit BOM, like the file requirements checker I have to > write, and ICU/uconv. Lets separate that into the two issuse it represents: EMITTING: T

Re: `` ", ` '

2002-11-04 Thread Otto Stolz
Mark Davis had written: > First, the ` is not a quote mark: it is a grave accent/ Second, it > also doesn/t say that you can/t use a slash/ say/ instead of a comma/ > apostrophe/ or period/ But that doesn/t mean it/s a good idea/ Doug Ewell had written: It's a terrible idea. I hate ``this quoti

RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Joseph Boyle
Michael, Software currently under development could use the identifiers for choosing whether to require or emit BOM, like the file requirements checker I have to write, and ICU/uconv. The inability to update to one standard all possible consuming software one might encounter (or for that matter h

Re: Header Reply-To

2002-11-04 Thread Mark Davis
I don't think that usage is described in the ASCII standard; as far as I can tell it is only in that RFC. And it leads to really ugly text nowadays, such as in http://www.bayarea.com/mld/mercurynews/news/4439727.htm Mark __ http://www.macchiato.com ► “Eppur si muo

Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]> > Thanks for the dozens of responses discussing consumers' behavior on UTF-8 > BOM. This is actually not what I'm concerned with, as I have to take it as a > given that there is both software that wants UTF-8 BOM and software that > doesn't want it. > > Cou

Re: `` ", ` '

2002-11-04 Thread Michael Everson
At 09:26 + 2002-11-04, John Delacour wrote: It's a terrible idea. I hate ``this quoting convention" (or alternatively ``this one'') as much as anyone. It's a very good idea if you consider that it enables us-ascii text to be easily converted to nicely formatted text containing curly quot

Re: `` ", ` '

2002-11-04 Thread John Delacour
At 1:29 pm -0800 3/11/02, Doug Ewell wrote: Mark Davis wrote: First, the ` is not a quote mark: it is a grave accent/ Second, it also doesn/t say that you can/t use a slash/ say/ instead of a comma/ apostrophe/ or period/ But that doesn/t mean it/s a good idea/ It's a terrible idea. I h

PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Joseph Boyle
Thanks for the dozens of responses discussing consumers' behavior on UTF-8 BOM. This is actually not what I'm concerned with, as I have to take it as a given that there is both software that wants UTF-8 BOM and software that doesn't want it. Could we evaluate the need for separate identifiers for

Re: ct, fj and blackletter ligatures

2002-11-04 Thread Thomas Lotze
William Overington wrote: > I don't know for certain but I suspect that it is that font designers > do this so that people can use an application such as Microsoft Paint > to produce an illustration using the font. In the absence of regular > Unicode code points for the ligatures, a font designer

Re: ct, fj and blackletter ligatures

2002-11-04 Thread William Overington
Thomas Lotze asked. >Why below 255? I don't know for certain but I suspect that it is that font designers do this so that people can use an application such as Microsoft Paint to produce an illustration using the font. In the absence of regular Unicode code points for the ligatures, a font desig

RE: Header Reply-To

2002-11-04 Thread Marco Cimarosti
Stefan Persson wrote: > > > Why doesn't that page follow the ASCII standard and/or > any ASCII-based > > > standard? > > > > What? As far as I can tell, it's 100% ASCII. > > It doesn't follow the ASCII standard as far as quotation marks are > concerned. Using ` and ' as quotation marks is a long