At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote:
Spoken language is not necessarily at all the same
thing as written language .
There are e.g. plenty of mutually incomprehensible
forms of spoken English which might each deserve a
code in a standard for spoken languages but
Elliotte Rusty Harold [EMAIL PROTECTED] wrote:
At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote:
Spoken language is not necessarily at all the same thing as
written language . There are e.g. plenty of mutually
incomprehensible forms of spoken English which might each deserve
Elliotte Rusty Harold wrote:
I've yet to encounter a spoken
version of English that I couldn't understand, after at most a couple
of minutes of accustoming myself to the accent.
You live in a country where dialect differentiation is a feeble thing,
consisting mainly in pronunciation, and
John Cowan noted:
In general, Geordie (the traditional dialect spoken around the Tyne
River in England) is considered to be the English dialect most difficult
for North Americans.
To that I would add Glaswegian. When watching the
Scots-produced mystery shows that show up on PBS in the
Kenneth Whistler wrote:
To that I would add Glaswegian. When watching the
Scots-produced mystery shows that show up on PBS in the United
States on occasion, my wife and I often turn to each other
in bafflement and say, "Subtitles, please."
Scots is a separate language! If you understand
John Cowan replied:
Kenneth Whistler wrote:
To that I would add Glaswegian. When watching the
Scots-produced mystery shows that show up on PBS in the United
States on occasion, my wife and I often turn to each other
in bafflement and say, "Subtitles, please."
Scots is a separate
On Thu, 30 Nov 2000, Kenneth Whistler wrote:
Scots is a separate language! If you understand anything at all
it's by a happy accident. (There is of course Scots-flavored
English as well, which is another matter.)
I was, of course, referring to Scots (alleged) English, and not
to
Peter Constable wrote:
This is a good example of why an enumeration of "languages"
based only on written forms (as found in ISO 639) is
insufficient for all user needs.
Of course ISO 639 is insufficient for *all* user needs
- no standard is. And is there actually a remit for
ISO 639 to
At 6:24 AM -0800 9/21/00, Marion Gunn wrote:
Arsa Antoine Leca:
CITE
Hindi, Hindustani, Urdu could be considered co-dialects, but
have important
sociolinguistic differences. Hindi uses the Devanagari writing system, and
formal vocabulary is borrowed from Sanskrit, de-Persianized,
Peter Constable wrote:
SRC is the code for 'Bosnian', 'Croatian', and 'Serbo-Croatian', which
means that there is a many-to-one mapping from ISO 639-1 'bs', 'hr',
'sr' to Ethnologue 'SRC'.
By Ethnologue standards of mutual intelligibility, there is only one
language here.
Well,
Arsa Antoine Leca:
CITE
Hindi, Hindustani, Urdu could be considered co-dialects, but have important
sociolinguistic differences. Hindi uses the Devanagari writing system, and
formal vocabulary is borrowed from Sanskrit, de-Persianized, de-Arabicized.
Literary Hindi, or Hindi-Urdu,
Marion Gunn [EMAIL PROTECTED] wrote:
Hindi, Hindustani, Urdu could be considered co-dialects...
Mm. Maybe a more polite (more PC) turn of phrase might be found than
"could be considered co-dialects", which more than implies, it
postulates the existence of a standard language referent of
Hi Peter,
The records in the text file you looked at are language-countries. It
is important to understand that the categorization is not reflected
by the records in that file, but by the three-letter codes. The
reason for codes being duplicated is because the languages in
question are
In message [EMAIL PROTECTED]
Doug Ewell [EMAIL PROTECTED] wrote:
Marion Gunn [EMAIL PROTECTED] wrote:
Mm. Maybe a more polite (more PC) turn of phrase might be found than
"could be considered co-dialects", which more than implies, it
postulates the existence of a standard
Arsa Kevin Bracey:
As far as I'm aware the co- prefix does mean an equal grouping. Examples that
spring to mind are co-worker, co-conspirator, co-exist, coincidence and
co-operative. I thought co-dialects was a cunningly concise way of saying
that they could all be considered dialects of
[Apologies if you already got this. It seems to be bouncing, and so am
sending it again.]
On 09/21/2000 10:52:22 AM Doug Ewell wrote:
[snip]
Agreed. This is a refreshing departure from the position I perceived
earlier, that ISO 639 was severely broken and the Ethnologue approach
was
On 09/16/2000 04:27:45 PM Doug Ewell wrote:
All I am asking in this particular case is for the Ethnologue editor to
assign *one* primary name (and spelling) to each three-letter language
code, and to relegate the other names to alternate status in a
consistent way. That is the first necessary
On 09/17/2000 03:19:32 PM Doug Ewell wrote:
Well, perhaps this is another, unintended example of a problem with
incorporating the Ethnologue linguistic distinctions into other
standards without serious review. If Spaniards consider their language
sufficiently different from the Spanish spoken
On 09/17/2000 11:39:14 AM Doug Ewell wrote:
What names are I supposed to associate with codes like SHU, MKJ, and
SRC in my (possibly hypothetical) application that deals with language
tags? Such associations are normally expected to be one-to-one.
If Ethnologue codes are going to be regarded
On 09/17/2000 07:22:05 PM "Carl W. Brown" wrote:
You are right the Ethnologue is not appropriate as a standard.
If we're assuming a single standard, in the sense of a single "tiling of
the plane" of languages, we're not proposing that the Ethnologue be the
standard. We are suggesting, though,
On 09/17/2000 08:02:20 PM John Cowan wrote:
Where I see using the SIL is as an extension of the ISO standard.
RFC 1766 exists to allow flexible extension to the ISO standard.
If there
is no ISO code then use the SIL code.
There are already collisions, so simply using one or the other
gets
On 09/17/2000 10:37:42 PM Doug Ewell wrote:
Since I have spent this whole, *very* OT discussion as the contrarian
It hasn't been all that off-topic. This has come up on numerous occasions
on this list, and I think is of interest to many of the participants, even
though it isn't strictly about
On 09/17/2000 11:13:36 PM John Cowan wrote:
Exactly so. And BTW "my proposal" is also Harald Alvestrand's proposal.
I wasn't aware of that until Harald mentioned something not too many days
ago.
- Peter
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 20, 2000 11:06 AM
What is important here is that, where ISO doesn't provide a code, that
users do have some other source of codes for internal and, more
importantly, interchange purposes. Many independent agencies and
From: "Carl W. Brown" [EMAIL PROTECTED]
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 20, 2000 11:06 AM
I agree. For example when it was brought up that other Turkic languages
might be using the dot less i. I noticed that the SIL confirmed that
Azerbaijan uses
From: Nick Nicholas [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 20, 2000 4:48 PM
Apart from cohabiting in Anatolia for a millenium. :-) In any case, the
Ethnologue is correct about Urum; Urum and Mariupolitan Greek are the two
languages spoken by an ethnically Greek population, which
John Cowan [EMAIL PROTECTED] wrote:
Doug wants the Ethnologue to give each of its languages (uniquely
tagged) a single unique worldwide authoritative name. That's not
reasonable in all cases, though it is in 99.5%.
What names are I supposed to associate with codes like SHU, MKJ, and
SRC in
Michael Kaplan [EMAIL PROTECTED] wrote:
Spaniards generally refer to their national language as "castellano,"
not "español,"
FWIW, I do not know of any Spaniards who object to "español" for the
generic language spoken by everyone around the world Castilian
they reserve for their own
://www.i18nWithVB.com/
- Original Message -
From: "Doug Ewell" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Sunday, September 17, 2000 1:19 PM
Subject: Re: [OT] Re: the Ethnologue
Michael Kaplan [EMAIL PROTECTED] wrote:
Spaniards generally refer to the
Michka wrote :
Most seem to be okay with the addition of the country/region tag from
ISO-3166 for determing the difference between languages spoken in several
places -- this is usually what is done for English, Arabic, Portuguese,
French, and Chinese, as well.
I don't see how one can use
arl W. Brown" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Sunday, September 17, 2000 3:41 PM
Subject: RE: [OT] Re: the Ethnologue
Michka wrote :
Most seem to be okay with the addition of the country/region tag from
ISO-3166 for determing the difference betw
John Cowan wrote:
I see the problem: the same language (with the same code) may be
preferentially
known by one name in one country and another name in another. Because
the Ethnologue names languages by country, conflicts like this can appear.
The entry on "Chadian Spoken Arabic" (in Chad) lists
On Sun, 17 Sep 2000, Carl W. Brown wrote:
I can understand your point of view as a standards person.
You are right the Ethnologue is not appropriate as a standard. But that
does not make it useless.
I am not a "standards person", and I think you have my stand mixed up.
I am in favor of
From: "John Cowan" [EMAIL PROTECTED]
Besides I can not
take any standard that implements i-klingon as a human language too
seriously.
Why not? Human beings speak it (some more fluently than others), and
write texts in it. Just follow the links from www.kli.org. It is not
anybody's
Michael Kaplan [EMAIL PROTECTED] wrote:
Don't forget to use 1554 (0x0612) if you need a Windows LCID for
Klingon - Latin and 2578 (0x0A12) for Klingon - pIqaD.
There's nothing more powerful than a user defined area. :-)
This is, at once, the best argument for and the best argument against
Ar 12:04 -0800 2000-09-13, scríobh [EMAIL PROTECTED]:
In
the mean time there are people who need language identifiers for their
data. It's in the cases of the more familiar languages (many of them
European), that we may need special cases to deal with distinct notions
such as written vs. spoken
Here's another thing about the Ethnologue list that has been almost,
but not quite, addressed. Just so everyone knows, the point here is
*NOT* that the six or seven thousand additional languages in Ethnologue
are somehow not worthy of encoding, but that the list is incompletely
edited and not
Ar 08:46 -0800 2000-09-16, scríobh Doug Ewell:
Here's another thing about the Ethnologue list that has been almost,
but not quite, addressed. Just so everyone knows, the point here is
*NOT* that the six or seven thousand additional languages in Ethnologue
are somehow not worthy of encoding, but
On Sat, 16 Sep 2000, Doug Ewell wrote:
But it gets worse. When I stripped out the alternate-names field and
again checked for duplicated codes, I found 14 (AVL AYL CAG CTO FUV GAX
GSC GSW JUP MHI MHM MKJ SHU SRC). Some of these duplicates differ only
in spelling (CAG 'Chulupi' vs.
From: "John Cowan" [EMAIL PROTECTED]
It seems clear from the detailed information that in all 14 cases,
there is only one language, known by different names in different
countries. Expecting the Ethnologue to solve this problem by fiat,
or even to openly prefer one name over another
On 09/14/2000 04:59:55 AM J%ORG KNAPPEN wrote:
What really makes me wonder, is that the ethnologue seems to ignore the
vast amount of published information on the german language and its
dialects.
There is more than a century of dialetological research on german, and
there
are easy accessible
On Wed, 13 Sep 2000, Michael Everson wrote:
It names Hancock 1990 as the source of this (impossibly incorrect)
information. In the bibliography there is no Hancock 1990.
Just like The Unicode Standard Version 3.0, page 317, which names
ISIRI 3342 as a source for ZWJ and ZWNJ, but there's no
[EMAIL PROTECTED] wrote:
I am sorry if I missed your point on Valencian. I must admit I didn't read
it through carefully because (a) I'm not that familiar with the speech
varieties in question, and (b) I had a very full in-box on this topic to
respond to yesterday.
In a nutshell: The
On 09/14/2000 10:29:52 AM John Cowan wrote:
In a nutshell: The Ethnologue treats Valencian as a dialect of Catalan,
which
is correct based on the mutual intelligibility criterion, but they have
distinct
orthographies. Unfortunately, the two are in the same country, so the
3166
trick (en-us
Roozbeh wrote:
On Wed, 13 Sep 2000, Michael Everson wrote:
It names Hancock 1990 as the source of this (impossibly incorrect)
information. In the bibliography there is no Hancock 1990.
Just like The Unicode Standard Version 3.0, page 317, which names
ISIRI 3342 as a source for ZWJ and
Peter Constable said:
On 09/13/2000 12:04:24 PM "Ayers, Mike" wrote:
What I'd really like to know is why there seems to be this
insistence on only one official list of languages when there appears to be a
clear need for two. There appears to be interest for a comprehensive, if
Rick McGowan asked:
Can anyone point me to an existing list of languages that is more =
comprehensive and better researched than the Ethnologue? If there is no =
such list, then we don't need to consider any alternatives, right?
Ask the closest university department of comparative
Rick McGowan wrote:
One of the major PROBLEMS with ISO 639, and other such lists developed by
ISO over the years, is that they are not brought into being, or maintained,
with the intent of being comprehensive. They are either intended to, or do
serve, some short-term narrow interests.
Ar 23:56 +0100 2000-09-12, scríobh Christopher J. Fynn:
A lot of what are listed as "languages" in the Ethnologue are what most people
would call dialects. For instance almost every known dialect of spoken Tibetan
is listed as a separate language in the Ethnolouge although they all share
only
The Library of Congress is very closely involved with ISO 639-2.
In fact, it is mostly their list of codes.
Misha
Oh Michael...
I think there are codes given to entities in the Ethnologue list that
aren't languages in the sense that we need to identify languages in IT
and in
At 02:10 AM 9/14/2000 -0700, [EMAIL PROTECTED] wrote:
The problem here is that ISO639 has, for better or worse, been adopted by
a wide array of DIFFERING applications. It's a convenience standard that
we vaguely have to live with.
No, it's an inconvenience standard that we vaguely have to live
Re the Linguasphere, Peter C wrote:
- As Chris mentioned, the info isn't available online.
Actually, the Linguasphere is available on-line, if you pay for it... One hundred
sixty pounds sterling (two hundred seventy-five US dollars) for a license to use the
electronic version.
Rick
With English, the problem with spell checking is quite
different, and different
lists of words would not be as easy for a solution: the en-US
vs. en-GB
tagging does not seem to adequately cover the various
differences such as
-ise vs. -ize, -our vs. -or, -re vs. -er, use of shall vs.
From: Arnt Gulbrandsen [mailto:[EMAIL PROTECTED]]
Are there valid reasons why the imperfect but comprehensive
needs to be a
standard? I can see one reason for it _not_ to be a standard:
A list can
be added to faster, so it's easier for a list to be truly
comprehensive.
Michael Everson wrote (amplified by me):
tire, civilize, color, center (US)
tyre, civilize, colour, centre (GB-Oxonia)
tyre, civilise, colour, centre (GB-Demotica)
tire, civilise, colour, centre (CA)
I have seen a photograph of an actual Canadian sign saying "Tire Centre",
which in GB
It takes a long time for data to work its way into an ISO standard.
This generalisation is unhelpful. Consider ISO 4217, the currency code
standard. As soon as the Maintenance Agency (MA) has been notified by a
competent authority (in this case, a central bank) of a legitimate
currency
On 09/13/2000 01:39:37 AM J%ORG KNAPPEN wrote:
I once looked at the ethnologue and its subdivision of the german language
is just ridiculous. Not small errors, a gross misconception. I don't trust
the ethnologue in area where I don't know the fact well, since it fails in
one
area where I know
On 09/13/2000 02:17:52 AM John Hudson wrote:
The first
tasks should be to a) identify the different kinds of information that
need
to be represented by tags (spoken languages, written languages, literary
languages (not the same thing as a written languages), particular
orthographies,
(Apologies for the cross-listing, but this has spanned several lists, and
there are parties on each that are not all on one and that are interested
in the discussion.)
On 09/13/2000 06:37:02 AM Michael Everson wrote:
Ar 23:56 +0100 2000-09-12, scríobh Christopher J. Fynn:
A lot of what are
On 09/13/2000 10:25:21 AM Antoine Leca wrote:
While I agree with you, there are anyway problems with the way languages
are distinguished...
Some comments in response:
- This is not primarily about major languages. They generally already have
the identifiers they need. In addition, because of
On 09/13/2000 11:59:01 AM Rick McGowan wrote:
Re the Linguasphere, Peter C wrote:
- As Chris mentioned, the info isn't available online.
Actually, the Linguasphere is available on-line, if you pay for it... One
hundred sixty pounds sterling (two hundred seventy-five US dollars) for a
license
On 09/13/2000 12:04:24 PM "Ayers, Mike" wrote:
What I'd really like to know is why there seems to be this
insistence on only one official list of languages when there appears to be
a
clear need for two. There appears to be interest for a comprehensive, if
imperfect, list on one hand,
I thnk there are codes given to entities in the Ethnologue list that aren't
languages in the sense that we need to identify languages in IT and in
Bibliography (which is what the codes are for). I think that it is not
mature for International Standardization. It is a work in progress, subject
to
Oh Michael...
I think there are codes given to entities in the Ethnologue list that
aren't languages in the sense that we need to identify languages in IT
and in Bibliography
ISO 639, and every other "standard" for language/locale codes also has this problem,
and from what I remember of the
Can anyone point me to an existing list of languages that is more
comprehensive and better researched than the Ethnologue?
If there is no such list, then we don't need to consider any
alternatives, right?
I'm not qualified to judge the merits of one list over another
but there certaily
65 matches
Mail list logo