RE: [OT] Re: the Ethnologue

2000-11-30 Thread Elliotte Rusty Harold
At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote: Spoken language is not necessarily at all the same thing as written language . There are e.g. plenty of mutually incomprehensible forms of spoken English which might each deserve a code in a standard for spoken languages but

RE: [OT] Re: the Ethnologue

2000-11-30 Thread Doug Ewell
Elliotte Rusty Harold [EMAIL PROTECTED] wrote: At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote: Spoken language is not necessarily at all the same thing as written language . There are e.g. plenty of mutually incomprehensible forms of spoken English which might each deserve

Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan
Elliotte Rusty Harold wrote: I've yet to encounter a spoken version of English that I couldn't understand, after at most a couple of minutes of accustoming myself to the accent. You live in a country where dialect differentiation is a feeble thing, consisting mainly in pronunciation, and

Re: [OT] Re: the Ethnologue

2000-11-30 Thread Kenneth Whistler
John Cowan noted: In general, Geordie (the traditional dialect spoken around the Tyne River in England) is considered to be the English dialect most difficult for North Americans. To that I would add Glaswegian. When watching the Scots-produced mystery shows that show up on PBS in the

Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan
Kenneth Whistler wrote: To that I would add Glaswegian. When watching the Scots-produced mystery shows that show up on PBS in the United States on occasion, my wife and I often turn to each other in bafflement and say, "Subtitles, please." Scots is a separate language! If you understand

Re: [OT] Re: the Ethnologue

2000-11-30 Thread Kenneth Whistler
John Cowan replied: Kenneth Whistler wrote: To that I would add Glaswegian. When watching the Scots-produced mystery shows that show up on PBS in the United States on occasion, my wife and I often turn to each other in bafflement and say, "Subtitles, please." Scots is a separate

Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan
On Thu, 30 Nov 2000, Kenneth Whistler wrote: Scots is a separate language! If you understand anything at all it's by a happy accident. (There is of course Scots-flavored English as well, which is another matter.) I was, of course, referring to Scots (alleged) English, and not to

RE: [OT] Re: the Ethnologue

2000-11-23 Thread Christopher John Fynn
Peter Constable wrote: This is a good example of why an enumeration of "languages" based only on written forms (as found in ISO 639) is insufficient for all user needs. Of course ISO 639 is insufficient for *all* user needs - no standard is. And is there actually a remit for ISO 639 to

Re: [OT] Re: the Ethnologue

2000-09-22 Thread Edward Cherlin
At 6:24 AM -0800 9/21/00, Marion Gunn wrote: Arsa Antoine Leca: CITE Hindi, Hindustani, Urdu could be considered co-dialects, but have important sociolinguistic differences. Hindi uses the Devanagari writing system, and formal vocabulary is borrowed from Sanskrit, de-Persianized,

Re: [OT] Re: the Ethnologue

2000-09-21 Thread Antoine Leca
Peter Constable wrote: SRC is the code for 'Bosnian', 'Croatian', and 'Serbo-Croatian', which means that there is a many-to-one mapping from ISO 639-1 'bs', 'hr', 'sr' to Ethnologue 'SRC'. By Ethnologue standards of mutual intelligibility, there is only one language here. Well,

Re: [OT] Re: the Ethnologue

2000-09-21 Thread Marion Gunn
Arsa Antoine Leca: CITE Hindi, Hindustani, Urdu could be considered co-dialects, but have important sociolinguistic differences. Hindi uses the Devanagari writing system, and formal vocabulary is borrowed from Sanskrit, de-Persianized, de-Arabicized. Literary Hindi, or Hindi-Urdu,

Re: [OT] Re: the Ethnologue

2000-09-21 Thread Doug Ewell
Marion Gunn [EMAIL PROTECTED] wrote: Hindi, Hindustani, Urdu could be considered co-dialects... Mm. Maybe a more polite (more PC) turn of phrase might be found than "could be considered co-dialects", which more than implies, it postulates the existence of a standard language referent of

Re: the Ethnologue

2000-09-21 Thread Doug Ewell
Hi Peter, The records in the text file you looked at are language-countries. It is important to understand that the categorization is not reflected by the records in that file, but by the three-letter codes. The reason for codes being duplicated is because the languages in question are

Re: [OT] Re: the Ethnologue

2000-09-21 Thread Kevin Bracey
In message [EMAIL PROTECTED] Doug Ewell [EMAIL PROTECTED] wrote: Marion Gunn [EMAIL PROTECTED] wrote: Mm. Maybe a more polite (more PC) turn of phrase might be found than "could be considered co-dialects", which more than implies, it postulates the existence of a standard

Re: [OT] Re: the Ethnologue

2000-09-21 Thread Marion Gunn
Arsa Kevin Bracey: As far as I'm aware the co- prefix does mean an equal grouping. Examples that spring to mind are co-worker, co-conspirator, co-exist, coincidence and co-operative. I thought co-dialects was a cunningly concise way of saying that they could all be considered dialects of

Re: the Ethnologue

2000-09-21 Thread Peter_Constable
[Apologies if you already got this. It seems to be bouncing, and so am sending it again.] On 09/21/2000 10:52:22 AM Doug Ewell wrote: [snip] Agreed. This is a refreshing departure from the position I perceived earlier, that ISO 639 was severely broken and the Ethnologue approach was

Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable
On 09/16/2000 04:27:45 PM Doug Ewell wrote: All I am asking in this particular case is for the Ethnologue editor to assign *one* primary name (and spelling) to each three-letter language code, and to relegate the other names to alternate status in a consistent way. That is the first necessary

Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable
On 09/17/2000 03:19:32 PM Doug Ewell wrote: Well, perhaps this is another, unintended example of a problem with incorporating the Ethnologue linguistic distinctions into other standards without serious review. If Spaniards consider their language sufficiently different from the Spanish spoken

Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable
On 09/17/2000 11:39:14 AM Doug Ewell wrote: What names are I supposed to associate with codes like SHU, MKJ, and SRC in my (possibly hypothetical) application that deals with language tags? Such associations are normally expected to be one-to-one. If Ethnologue codes are going to be regarded

RE: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable
On 09/17/2000 07:22:05 PM "Carl W. Brown" wrote: You are right the Ethnologue is not appropriate as a standard. If we're assuming a single standard, in the sense of a single "tiling of the plane" of languages, we're not proposing that the Ethnologue be the standard. We are suggesting, though,

RE: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable
On 09/17/2000 08:02:20 PM John Cowan wrote: Where I see using the SIL is as an extension of the ISO standard. RFC 1766 exists to allow flexible extension to the ISO standard. If there is no ISO code then use the SIL code. There are already collisions, so simply using one or the other gets

Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable
On 09/17/2000 10:37:42 PM Doug Ewell wrote: Since I have spent this whole, *very* OT discussion as the contrarian It hasn't been all that off-topic. This has come up on numerous occasions on this list, and I think is of interest to many of the participants, even though it isn't strictly about

Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable
On 09/17/2000 11:13:36 PM John Cowan wrote: Exactly so. And BTW "my proposal" is also Harald Alvestrand's proposal. I wasn't aware of that until Harald mentioned something not too many days ago. - Peter

RE: [OT] Re: the Ethnologue

2000-09-20 Thread Carl W. Brown
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 20, 2000 11:06 AM What is important here is that, where ISO doesn't provide a code, that users do have some other source of codes for internal and, more importantly, interchange purposes. Many independent agencies and

RE: [OT] Re: the Ethnologue

2000-09-20 Thread Nick Nicholas
From: "Carl W. Brown" [EMAIL PROTECTED] From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 20, 2000 11:06 AM I agree. For example when it was brought up that other Turkic languages might be using the dot less i. I noticed that the SIL confirmed that Azerbaijan uses

RE: [OT] Re: the Ethnologue

2000-09-20 Thread Carl W. Brown
From: Nick Nicholas [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 20, 2000 4:48 PM Apart from cohabiting in Anatolia for a millenium. :-) In any case, the Ethnologue is correct about Urum; Urum and Mariupolitan Greek are the two languages spoken by an ethnically Greek population, which

Re: [OT] Re: the Ethnologue

2000-09-17 Thread Doug Ewell
John Cowan [EMAIL PROTECTED] wrote: Doug wants the Ethnologue to give each of its languages (uniquely tagged) a single unique worldwide authoritative name. That's not reasonable in all cases, though it is in 99.5%. What names are I supposed to associate with codes like SHU, MKJ, and SRC in

Re: [OT] Re: the Ethnologue

2000-09-17 Thread Doug Ewell
Michael Kaplan [EMAIL PROTECTED] wrote: Spaniards generally refer to their national language as "castellano," not "español," FWIW, I do not know of any Spaniards who object to "español" for the generic language spoken by everyone around the world Castilian they reserve for their own

Re: [OT] Re: the Ethnologue

2000-09-17 Thread Michael \(michka\) Kaplan
://www.i18nWithVB.com/ - Original Message - From: "Doug Ewell" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Sunday, September 17, 2000 1:19 PM Subject: Re: [OT] Re: the Ethnologue Michael Kaplan [EMAIL PROTECTED] wrote: Spaniards generally refer to the

RE: [OT] Re: the Ethnologue

2000-09-17 Thread Carl W. Brown
Michka wrote : Most seem to be okay with the addition of the country/region tag from ISO-3166 for determing the difference between languages spoken in several places -- this is usually what is done for English, Arabic, Portuguese, French, and Chinese, as well. I don't see how one can use

Re: [OT] Re: the Ethnologue

2000-09-17 Thread Michael \(michka\) Kaplan
arl W. Brown" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Sunday, September 17, 2000 3:41 PM Subject: RE: [OT] Re: the Ethnologue Michka wrote : Most seem to be okay with the addition of the country/region tag from ISO-3166 for determing the difference betw

RE: [OT] Re: the Ethnologue

2000-09-17 Thread Carl W. Brown
John Cowan wrote: I see the problem: the same language (with the same code) may be preferentially known by one name in one country and another name in another. Because the Ethnologue names languages by country, conflicts like this can appear. The entry on "Chadian Spoken Arabic" (in Chad) lists

RE: [OT] Re: the Ethnologue

2000-09-17 Thread John Cowan
On Sun, 17 Sep 2000, Carl W. Brown wrote: I can understand your point of view as a standards person. You are right the Ethnologue is not appropriate as a standard. But that does not make it useless. I am not a "standards person", and I think you have my stand mixed up. I am in favor of

Re: [even more OT] Re: the Ethnologue

2000-09-17 Thread Michael \(michka\) Kaplan
From: "John Cowan" [EMAIL PROTECTED] Besides I can not take any standard that implements i-klingon as a human language too seriously. Why not? Human beings speak it (some more fluently than others), and write texts in it. Just follow the links from www.kli.org. It is not anybody's

Re: [even more OT] Re: the Ethnologue

2000-09-17 Thread Doug Ewell
Michael Kaplan [EMAIL PROTECTED] wrote: Don't forget to use 1554 (0x0612) if you need a Windows LCID for Klingon - Latin and 2578 (0x0A12) for Klingon - pIqaD. There's nothing more powerful than a user defined area. :-) This is, at once, the best argument for and the best argument against

Re: the Ethnologue

2000-09-16 Thread Michael Everson
Ar 12:04 -0800 2000-09-13, scríobh [EMAIL PROTECTED]: In the mean time there are people who need language identifiers for their data. It's in the cases of the more familiar languages (many of them European), that we may need special cases to deal with distinct notions such as written vs. spoken

[OT] Re: the Ethnologue

2000-09-16 Thread Doug Ewell
Here's another thing about the Ethnologue list that has been almost, but not quite, addressed. Just so everyone knows, the point here is *NOT* that the six or seven thousand additional languages in Ethnologue are somehow not worthy of encoding, but that the list is incompletely edited and not

[OT] Re: the Ethnologue

2000-09-16 Thread Michael Everson
Ar 08:46 -0800 2000-09-16, scríobh Doug Ewell: Here's another thing about the Ethnologue list that has been almost, but not quite, addressed. Just so everyone knows, the point here is *NOT* that the six or seven thousand additional languages in Ethnologue are somehow not worthy of encoding, but

Re: [OT] Re: the Ethnologue

2000-09-16 Thread John Cowan
On Sat, 16 Sep 2000, Doug Ewell wrote: But it gets worse. When I stripped out the alternate-names field and again checked for duplicated codes, I found 14 (AVL AYL CAG CTO FUV GAX GSC GSW JUP MHI MHM MKJ SHU SRC). Some of these duplicates differ only in spelling (CAG 'Chulupi' vs.

Re: [OT] Re: the Ethnologue

2000-09-16 Thread John Cowan
From: "John Cowan" [EMAIL PROTECTED] It seems clear from the detailed information that in all 14 cases, there is only one language, known by different names in different countries. Expecting the Ethnologue to solve this problem by fiat, or even to openly prefer one name over another

Re: the Ethnologue

2000-09-14 Thread Peter_Constable
On 09/14/2000 04:59:55 AM J%ORG KNAPPEN wrote: What really makes me wonder, is that the ethnologue seems to ignore the vast amount of published information on the german language and its dialects. There is more than a century of dialetological research on german, and there are easy accessible

Re: the Ethnologue

2000-09-14 Thread Roozbeh Pournader
On Wed, 13 Sep 2000, Michael Everson wrote: It names Hancock 1990 as the source of this (impossibly incorrect) information. In the bibliography there is no Hancock 1990. Just like The Unicode Standard Version 3.0, page 317, which names ISIRI 3342 as a source for ZWJ and ZWNJ, but there's no

Re: the Ethnologue

2000-09-14 Thread John Cowan
[EMAIL PROTECTED] wrote: I am sorry if I missed your point on Valencian. I must admit I didn't read it through carefully because (a) I'm not that familiar with the speech varieties in question, and (b) I had a very full in-box on this topic to respond to yesterday. In a nutshell: The

Re: the Ethnologue

2000-09-14 Thread Peter_Constable
On 09/14/2000 10:29:52 AM John Cowan wrote: In a nutshell: The Ethnologue treats Valencian as a dialect of Catalan, which is correct based on the mutual intelligibility criterion, but they have distinct orthographies. Unfortunately, the two are in the same country, so the 3166 trick (en-us

ISIRI 3342 (was Re: the Ethnologue)

2000-09-14 Thread Kenneth Whistler
Roozbeh wrote: On Wed, 13 Sep 2000, Michael Everson wrote: It names Hancock 1990 as the source of this (impossibly incorrect) information. In the bibliography there is no Hancock 1990. Just like The Unicode Standard Version 3.0, page 317, which names ISIRI 3342 as a source for ZWJ and

RE: the Ethnologue

2000-09-14 Thread Timothy Partridge
Peter Constable said: On 09/13/2000 12:04:24 PM "Ayers, Mike" wrote: What I'd really like to know is why there seems to be this insistence on only one official list of languages when there appears to be a clear need for two. There appears to be interest for a comprehensive, if

Re: the Ethnologue

2000-09-13 Thread J%ORG KNAPPEN
Rick McGowan asked: Can anyone point me to an existing list of languages that is more = comprehensive and better researched than the Ethnologue? If there is no = such list, then we don't need to consider any alternatives, right? Ask the closest university department of comparative

Re: the Ethnologue

2000-09-13 Thread John Hudson
Rick McGowan wrote: One of the major PROBLEMS with ISO 639, and other such lists developed by ISO over the years, is that they are not brought into being, or maintained, with the intent of being comprehensive. They are either intended to, or do serve, some short-term narrow interests.

Re: the Ethnologue

2000-09-13 Thread Michael Everson
Ar 23:56 +0100 2000-09-12, scríobh Christopher J. Fynn: A lot of what are listed as "languages" in the Ethnologue are what most people would call dialects. For instance almost every known dialect of spoken Tibetan is listed as a separate language in the Ethnolouge although they all share only

Re: the Ethnologue

2000-09-13 Thread Misha Wolf
The Library of Congress is very closely involved with ISO 639-2. In fact, it is mostly their list of codes. Misha Oh Michael... I think there are codes given to entities in the Ethnologue list that aren't languages in the sense that we need to identify languages in IT and in

Re: the Ethnologue

2000-09-13 Thread John Hudson
At 02:10 AM 9/14/2000 -0700, [EMAIL PROTECTED] wrote: The problem here is that ISO639 has, for better or worse, been adopted by a wide array of DIFFERING applications. It's a convenience standard that we vaguely have to live with. No, it's an inconvenience standard that we vaguely have to live

Re: the Ethnologue

2000-09-13 Thread Rick McGowan
Re the Linguasphere, Peter C wrote: - As Chris mentioned, the info isn't available online. Actually, the Linguasphere is available on-line, if you pay for it... One hundred sixty pounds sterling (two hundred seventy-five US dollars) for a license to use the electronic version. Rick

RE: the Ethnologue

2000-09-13 Thread Ayers, Mike
With English, the problem with spell checking is quite different, and different lists of words would not be as easy for a solution: the en-US vs. en-GB tagging does not seem to adequately cover the various differences such as -ise vs. -ize, -our vs. -or, -re vs. -er, use of shall vs.

RE: the Ethnologue

2000-09-13 Thread Ayers, Mike
From: Arnt Gulbrandsen [mailto:[EMAIL PROTECTED]] Are there valid reasons why the imperfect but comprehensive needs to be a standard? I can see one reason for it _not_ to be a standard: A list can be added to faster, so it's easier for a list to be truly comprehensive.

Re: the Ethnologue

2000-09-13 Thread John Cowan
Michael Everson wrote (amplified by me): tire, civilize, color, center (US) tyre, civilize, colour, centre (GB-Oxonia) tyre, civilise, colour, centre (GB-Demotica) tire, civilise, colour, centre (CA) I have seen a photograph of an actual Canadian sign saying "Tire Centre", which in GB

Re: the Ethnologue

2000-09-13 Thread Misha Wolf
It takes a long time for data to work its way into an ISO standard. This generalisation is unhelpful. Consider ISO 4217, the currency code standard. As soon as the Maintenance Agency (MA) has been notified by a competent authority (in this case, a central bank) of a legitimate currency

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 01:39:37 AM J%ORG KNAPPEN wrote: I once looked at the ethnologue and its subdivision of the german language is just ridiculous. Not small errors, a gross misconception. I don't trust the ethnologue in area where I don't know the fact well, since it fails in one area where I know

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 02:17:52 AM John Hudson wrote: The first tasks should be to a) identify the different kinds of information that need to be represented by tags (spoken languages, written languages, literary languages (not the same thing as a written languages), particular orthographies,

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
(Apologies for the cross-listing, but this has spanned several lists, and there are parties on each that are not all on one and that are interested in the discussion.) On 09/13/2000 06:37:02 AM Michael Everson wrote: Ar 23:56 +0100 2000-09-12, scríobh Christopher J. Fynn: A lot of what are

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 10:25:21 AM Antoine Leca wrote: While I agree with you, there are anyway problems with the way languages are distinguished... Some comments in response: - This is not primarily about major languages. They generally already have the identifiers they need. In addition, because of

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 11:59:01 AM Rick McGowan wrote: Re the Linguasphere, Peter C wrote: - As Chris mentioned, the info isn't available online. Actually, the Linguasphere is available on-line, if you pay for it... One hundred sixty pounds sterling (two hundred seventy-five US dollars) for a license

RE: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 12:04:24 PM "Ayers, Mike" wrote: What I'd really like to know is why there seems to be this insistence on only one official list of languages when there appears to be a clear need for two. There appears to be interest for a comprehensive, if imperfect, list on one hand,

Re: the Ethnologue

2000-09-12 Thread Michael Everson
I thnk there are codes given to entities in the Ethnologue list that aren't languages in the sense that we need to identify languages in IT and in Bibliography (which is what the codes are for). I think that it is not mature for International Standardization. It is a work in progress, subject to

Re: the Ethnologue

2000-09-12 Thread Rick McGowan
Oh Michael... I think there are codes given to entities in the Ethnologue list that aren't languages in the sense that we need to identify languages in IT and in Bibliography ISO 639, and every other "standard" for language/locale codes also has this problem, and from what I remember of the

Re: the Ethnologue

2000-09-12 Thread Christopher J. Fynn
Can anyone point me to an existing list of languages that is more comprehensive and better researched than the Ethnologue? If there is no such list, then we don't need to consider any alternatives, right? I'm not qualified to judge the merits of one list over another but there certaily