Dialects and orthographies in BCP 47 (was: Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)

Doug Ewell Wed, 04 Aug 2010 12:48:03 -0700

verdy_p <verdy underscore p at wanadoo dot fr> wrote:

> Really, "Hans", "Hant", "Latf", "Latg" could have been avoided in ISO 15924, 
> if orthographic variants of the same 
> languages had been encoded in the IANA database for BCP 47, independantly of 
> the effective font style.


Actually it was the opposite; the ability to use standardized ISO 15924
code elements to express concepts like "Simplified Han" was one of the
driving forces behind RFC 4646 and its shift in focus from whole tags to
subtags.

In any case, the bibliographers and others who use ISO 15924 but not BCP
47 might need to make these distinctions as well.

> But for now there's still no formal model for encoding language dialects, so 
> BCP 47 language tags still need to use 
> tags for ISO 3166-1 region codes and for the script variant, when it should 
> just qualify the generic script code (or 
> it could even drop this ISO 15924 code if there was a formal code for the 
> dialect written in a specific orthography: 
> we would also deprecate "Jpan", "Hrkt" in ISO 15924).

There is no "formal model" in the sense of a standard N-letter subtag
for dialects, because the concept of a dialect is too open-ended and
unsystematic.  The word means different things to different people. 
What may be a dialect to one person might be a full-blown National
Language to another, or just a funny accent to a third.

BCP 47 tags never *need* to use either the region subtag or the script
subtag, unless they are necessary to convey the intended meaning.  A tag
like "ja-Jpan-JP" is almost never needed, because almost all written
Japanese is "using the Japanese writing system" ('Jpan') and "as used in
Japan" ('JP').

I'm not sure what dialect is being posited here that would make the
difference between having to specify a script subtag and not having to.

> Orthographic variants would include also:
> - the various romanization systems (for example Pinyin) and phonetic 
> transcriptions (IPA phonetic, simplified IPA 
> phonology),

'pinyin', 'fonipa'

> - the simplified orthographies (e.g. orthographic reforms in French and 
> German),

'1606nict', '1694acad', '1901', '1996'

> - and some other minor variants (like the vertical presentation for 
> East-Asian scripts, or Boustrophedon 
> presentation for Ancient Greek, if this alters the orientation of characters 
> that had to be encoded differently, and 
> the default mirroring properties are not applicable to the encoded characters 
> in the basic language).
> 
> For now these dialectal/orthographic variants of written languages can be 
> registered in the IANA database for BCP 
> 47, using codes with at least 5 letters (or with at least 4 letters or digits 
> if there's at least one digit),

A 4-character variant subtag must *begin* with a digit.

> but 
> ideally the dialectal variant should be encoded as a tag BEFORE the 
> orthographic variant.

Why is this important?

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Dialects and orthographies in BCP 47 (was: Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)

Reply via email to