On 09/26/2002 07:24:08 PM "Murray Sargent" wrote:
>I don't think the idea is that codepage equals language. Rather codepage >equals a writing system, which consists of one or more scripts (e.g., 6 >scripts for ShiftJIS). As such the codepage is a useful cue in choosing >an appropriate font for rendering text. (Murray and I talked about this some at dinner a couple of weeks ago, so there's some history here.) I don't think things are quite that simple. A codepage *can* be a useful cue in choosing an appropriate font (or in choosing typographic preferences by whatever means). This certainly may be the case in some instances, such as Shift JIS. But it's not always the case. For instance, cp1251 doesn't tell you what language is involved, and isn't sufficient to determine which italic variants of certain Cyrillic characters are needed. Similarly, cp1250 doesn't tell you what cultural preferences should apply in relation to design and alignment of the ogonek diacritic (e.g. Polish and Lithuanian differ in this regard), or other diacritics (e.g. caron should have a distinct form for Czech); and cp1252 doesn't tell you about cultural preferences regarding cedilla (three different forms can be used for French, but only one is acceptable for Portuguese or Catalan). That's why I maintain that a codepage is a character set, but not a writing system. In general, a codepage does not determine a set of rules for writing; it just provides a vocabularly with which to work. >The bottom line is that if text was generated using a particular >codepage it's likely that the creator of that text intended the text to >be rendered with a font that supports that codepage. Of course, fonts can support multiple codepages. Given e.g. Arial, Tahoma and Verdana, they all support codepages 1250, 1251, 1252, 1253, 1254, 1257 and 1258. That doesn't tell you whether they're appropriate for Polish or Lithuanian or Czech or whatever. Even the fact that they support cp1258 doesn't imply that they are appropriate for Vietnamese: e.g. the default glyphs in Arial for U+1EA5 and U+1EA7 do not have the diacritics stacked in the way needed for Vietnamese. I'm not saying that codepage information isn't ever useful. Obviously, you have found it very useful. But the usefulness has limits. - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>