Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Doug Ewell
Tex Texin wrote: > 6) "UTF-8 signatures are not evil" ok. In and of themselves, they are > not. Mandating their use everywhere is evil. Notepad is broken in > always outputting it, since notepad is used for files that are also > not plain text. The rest of the world should not change because > no

Re: Everson Mono

2003-02-17 Thread Doug Ewell
wrote: > Have you tried the CharacterMap program that comes with Windows 2000 > and WinXP? Yes, of course. I mentioned originally that I had to plug each font into Character Map, one by one, to find out what characters were supported by each. This was tedious and labor-intensive, and so I look

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Tex Texin
Doug, Hi. 1) Please note I said the "UTF-8 BOM" was relatively recent, not the BOM. 2) Yes, the more forms of encoding declaration that exist the more likely for conflicts to occur, or for there to be increasingly complex precedence rules. And yes, I am sure others have said it, but I also harp

Re: Bidi overrides and chocolate paper

2003-02-17 Thread Roozbeh Pournader
On Mon, 17 Feb 2003, Doug Ewell wrote: > >http://www.farsiweb.info/unicode/chocolatepaper.jpg > > The number "100" isn't supposed to be RTL, is it? Or is that what you > meant by "override"? Exactly. If one puts the correct sequence of characters in a Right-to-Left override mode (like putt

Re: Bidi overrides and chocolate paper

2003-02-17 Thread Doug Ewell
Roozbeh Pournader wrote: > Look how a Turkish chocolate-making company writes all Arabic in Bidi > override mode: > >http://www.farsiweb.info/unicode/chocolatepaper.jpg The number "100" isn't supposed to be RTL, is it? Or is that what you meant by "override"? -Doug Ewell Fullerton, Califo

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread John Burger
From: [EMAIL PROTECTED] The lack of the BOM in the 'white space' section of the specs may just be an oversight. As one of the authors of that particular passage, I can attest that we considered fairly carefully which of ISO10646's many space characters should count as whitespace in that partic

Re: DBCS and Unicode 3.1

2003-02-17 Thread Jungshik Shin
On Mon, 17 Feb 2003, Markus Scherer wrote: > Michael (michka) Kaplan wrote: > > There are standards like the Chinese GB18030 which supports characters > > of 1, 2, or 4 bytes -- definitely MBCS again. > > Other examples: There are EUC-JP (1/2/3 bytes per character) and > EUC-CN (1/2/4 BpC) which

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Martin Duerst
Some comments: - If you can avoid it, don't use a BOM at the start of an UTF-8 HTML file. It will display nicely on more browsers. - The W3C Validator http://validator.w3.org/ accepts the BOM for HTML 4.01, and also XHTML. It probably should produce a warning. It did when I originally added

Re: DBCS and Unicode 3.1

2003-02-17 Thread Doug Ewell
wrote: > Now that Unicode 3.1 has broken the two-byte barrier, is there a > corresponding update for DBCS? The preferred migration path should be to upgrade *from* DBCS *to* Unicode. Only the People's Republic of China seems interested in large-scale expansions of non-Unicode character encoding

Re: DBCS and Unicode 3.1

2003-02-17 Thread Markus Scherer
Michael (michka) Kaplan wrote: Well, DBCS means "double byte character set" and thus it is always two bytes. But its a theoretical definition since there are no actual DBCS code pages -- all of the ones that exist are MBCS (multibyte character set) since they support both one-byte and two-byte cha

Re: DBCS and Unicode 3.1

2003-02-17 Thread Michael \(michka\) Kaplan
Well, DBCS means "double byte character set" and thus it is always two bytes. But its a theoretical definition since there are no actual DBCS code pages -- all of the ones that exist are MBCS (multibyte character set) since they support both one-byte and two-byte characters. There are standards li

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Markus Scherer
I would like to add some information here without getting myself into the core of the discussion: HTML recognizes a lot fewer "whitespace" characters than Java or Unicode. Different people have different sets of "whitespace" characters. Unicode's White_Space property (PropList.txt) contains 24 c

DBCS and Unicode 3.1

2003-02-17 Thread Erik.Ostermueller
Hello all, In the past, DBCS could support characters no larger than 2 bytes. Correct? Now that Unicode 3.1 has broken the two-byte barrier, is there a corresponding update for DBCS? I've been getting most of my DBCS info from these url's: http://oss.software.ibm.com/icu/userguide/conversion-d

Re: TrueType Explorer (was RE: Everson Mono)

2003-02-17 Thread jameskass
. Font-related accessory developers please note: What would really be handy is a feature that allowed the user to enter a Unicode character or string of characters into an input field which would result in a display listing all installed fonts and showing the character or string in each listed fa

Re: TrueType Explorer (was RE: Everson Mono)

2003-02-17 Thread Adam Twardoch
> From: Doug Ewell [mailto:[EMAIL PROTECTED]] > On a somewhat related note, here's a utility I'd like: something that could > look inside a TrueType or OpenType font and tell me what Unicode code points > it covers (i.e. has one or more glyphs for). Try FontExpert 2003 for Windows: http://www.pro

Re: BOM's at Beginning of Web Pages? Mac IE's Euro

2003-02-17 Thread Deborah Goldsmith
I can't explain why IE 5.2.2 is displaying the UTF-8 BOM as a Euro. It's important to understand that IE 5.2.x does not use Unicode for rendering. It takes the following approach: 1. Convert the text from the specified character set to runs of text in various Mac OS encodings. 2. Draw each text

TrueType Explorer (was RE: Everson Mono)

2003-02-17 Thread Rick Cameron
TrueType Explorer can do this and more. It will display the Unicode ranges supported by a font, and all the glyphs for a given range. It also displays Panose classification, Name strings, Kerning pairs and supported code pages. Windows only. Freeware from http://www3.sympatico.ca/chris.lamoureux2

RE: Finding a font that contains a particular character

2003-02-17 Thread Carl W. Brown
Alan, IE uses mlang to determine if you have the right fonts for the characters. http://msdn.microsoft.com/library/default.asp?url=/workshop/misc/mlang/overv iew/overview.asp Carl > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Alan Wood > Sent: M

Re: Everson Mono

2003-02-17 Thread Peter_Constable
On 02/15/2003 06:03:13 PM "Doug Ewell" wrote: >On a somewhat related note, here's a utility I'd like: something that >could look inside a TrueType or OpenType font and tell me what Unicode >code points it covers (i.e. has one or more glyphs for). Have you tried the CharacterMap program that come

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread jameskass
. Roozbeh Pournader wrote, > And some people find it annoying and dangerous. A BOM-ed UTF-8 file breaks > the Unix text file model to some degree. I can post a link if anyone's > interested. One report seen recently during various searches was that the BOM caused a core dump in certain cases. Se

Re: Plane 14 Tag Deprecation Issue

2003-02-17 Thread Peter_Constable
On 02/14/2003 09:00:16 AM Michael Everson wrote: >At 13:38 + 2003-02-14, William Overington wrote: > >>Books in libraries are often classified with a code consisting of digits and >>a full stop character. For example, the number 515.53 is on a label which >>is still on the spine of a book wh

Re: Character display problem in browser

2003-02-17 Thread Markus Scherer
SRIDHARAN Aravind wrote: My database is Oracle and its character set is WE8ISO8859P1. In database, I have stored special Polish characters. First of all, the database character set is ISO 8859-1 which cannot represent "special Polish characters". In all likelyhood, you have taken a byte stream

Re: Finding a font that contains a particular character

2003-02-17 Thread John H. Jenkins
On Monday, February 17, 2003, at 09:36 AM, Alan Wood wrote: Someone recently asked how to find a font that contains a particular Unicode character. I don't have an easy answer, but TrueType Explorer (for Windows) may help: On the Mac, BTW, Mac OS X 10.2 or later, you can either use the cha

Finding a font that contains a particular character

2003-02-17 Thread Alan Wood
Someone recently asked how to find a font that contains a particular Unicode character. I don't have an easy answer, but TrueType Explorer (for Windows) may help: http://www3.sympatico.ca/chris.lamoureux2/ It reads the tables in your installed fonts (or a drag-and-dropped font that is not instal

Re: Character display problem in browser

2003-02-17 Thread Otto Stolz
SRIDHARAN Aravind wrote: When the character set in browser is Central European(Windows-1250), then small a with ogonek(\u0105) comes fine. When the character set in browser is Central European(ISO-8859-2), then small a with ogonek(\u0105) comes like s with caron(\u0161). Cf.

Bidi overrides and chocolate paper

2003-02-17 Thread Roozbeh Pournader
Look how a Turkish chocolate-making company writes all Arabic in Bidi override mode: http://www.farsiweb.info/unicode/chocolatepaper.jpg roozbeh

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Frank da Cruz
On Mon, 17 Feb 2003 08:13:51 -0500 (EST), Jungshik Shin <[EMAIL PROTECTED]> wrote: > Incidentally, it just occurred to me that ftp/ssh clients may offer an > user-configurable option for the automatic removal of 'UTF-8 BOM' at > the beginning of a text file in UTF-8 when moving files from Wind

Re: BOM's at Beginning of Web Pages? Mac IE's Euro

2003-02-17 Thread Tom Gewecke
>The first looks like Courier New, I have been able to confirm that it is indeed the font Courier New which is being used by Mac OS X IE 5.2 to display the bytes 0xEF 0xBB 0xBF as the Euro sign.

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Roozbeh Pournader
On Mon, 17 Feb 2003, Jungshik Shin wrote: > Incidentally, it just occurred to me that ftp/ssh clients may offer an > user-configurable option for the automatic removal of 'UTF-8 BOM' at > the beginning of a text file in UTF-8 when moving files from Windows to > non-Windows platforms (Unix/Uni

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Tom Gewecke
>If this is true -- that U+FEFF is a kind of meta-character that doesn't >really belong to the text per se -- then it should be equally true for >UTF-8, whether its role is as a true Byte Order Mark (needed in UTF-16 >and UTF-32 but not UTF-8) or as a signature (potentially useful in all >Unicode

Character display problem in browser

2003-02-17 Thread SRIDHARAN Aravind
Hi All, I have a problem. My database is Oracle and its character set is WE8ISO8859P1. In database, I have stored special Polish characters. When I display these data in browser using J2EE technology(servlets and jsp's), I face the following problem. When the character set in browser is Central

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Jungshik Shin
On Mon, 17 Feb 2003, Michael Everson wrote: > X browsers, and the keepers of that home page should delete the first > character before the HTML begins right away. I am cc:ing the keepers I agree that they should. Incidentally, it just occurred to me that ftp/ssh clients may offer an user-c

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Roozbeh Pournader
On Mon, 2003-02-17 at 15:42, Michael Everson wrote: > I would like to repeat, all of this BOM discussion is all very well > and good, but the Unicode home page displays incorrectly on three OS > X browsers, and the keepers of that home page should delete the first > character before the HTML beg

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Michael Everson
I would like to repeat, all of this BOM discussion is all very well and good, but the Unicode home page displays incorrectly on three OS X browsers, and the keepers of that home page should delete the first character before the HTML begins right away. I am cc:ing the keepers here. -- Michael Ev

Re: BOM's at Beginning of Web Pages? Mac IE's Euro

2003-02-17 Thread Tex Texin
http://www.w3.org/TR/REC-html40/charset.html#spec-char-encoding Says: = For example, to specify that the character encoding of the current document is "EUC-JP", a document should include the following META declaration: The META declaration must only be used when the character encoding is org

Re: BOM's at Beginning of Web Pages? Mac IE's Euro

2003-02-17 Thread Roozbeh Pournader
On Mon, 2003-02-17 at 14:27, Tex Texin wrote: > > AFAICR, there is supposed to be no single non-ASCII character before that > > tag. > > I don't believe the standard says that. However, it is recommended that the > META content-type statement is placed as early as possible, [...] Yes, it was som

Re: BOM's at Beginning of Web Pages? Mac IE's Euro

2003-02-17 Thread Tex Texin
Hi, > AFAICR, there is supposed to be no single non-ASCII character before that > tag. I don't believe the standard says that. However, it is recommended that the META content-type statement is placed as early as possible, for exactly the reason that any non-ascii characters that appear earlier w

XML and tags (LONG) (derives from Re: Plane 14 Tag Deprecation Issue)

2003-02-17 Thread William Overington
Two posts in the Unicode list in the last few days advocate using XML rather than using plane 14 tags. I knew very little XML so I started to learn some more so as to assess the matter of whether there is any good reason for using XML rather than plane 14 tags. Certainly no reasons were stated in

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Tex Texin
Dudes and Dudettes, Not sure I read all of the thread, but: 1) BOM is not only allowed but recommended in HTML UTF-16 documents. see section 5.1 http://www.w3.org/TR/REC-html40/charset.html I am not sure what the comment about removing BOM is referring to. Is that someone's explanation or is it

Re: BOM's at Beginning of Web Pages?

2003-02-17 Thread Jungshik Shin
DE> Michka is probably right that Notepad is one of the more popular HTML DE> editors out there, The above statement is fair and balanced. MK> the best tool for quick fixes to HTML pages *is* notepad, which is The above is not. 'One of the best tools' would have been much better. MK> wh