mouss wrote:

>> However, it is true that the vast majority of the corpus currently
>> comes from
>> folks who speak English (King's or Yankee) as a primary language, and
>> that's a
>> bit of a problem as it creates considerable bias in the rules.
>>
>> And even us US folks do have encoding issues. After all, English is
>> not our
>> official language here in the US,
> 
> what do you mean here? what would be your official language?

The United States of America does not have any official language.

Americanized English is our common language, but it's not official. This means
that our government has to supply forms and materials in many languages for its
citizens, because it cannot require that citizens speak English.

For example, we have tax forms in French:

http://www.irs.gov/pub/irs-access/f2290fr_accessible.pdf

Admittedly non-english forms and services are somewhat secondary here, but they
are present.

> 
>  and I've got plenty of users that speak
>> multiple languages, not all of which use plain-ascii.
>>
> 
> I guess so. now I'm not sure our situation isn't worst because people
> tried to find non standard solutions that are still used. I still
> remember the days when some customers were asking us to "fix" our
> software because "it broke their accents"... hopefully these times are
> gone, but I still see "broken" mail (much more than I should). actually,
> I also see mail that doesn't get rendered correctly on thunderbird. so
> I'll admit that the issue isn't really about accented chars...
> 

Well, yours is certainly worse, or at least more prevalent, than the problem
here in the US, but I would not say it's the worst.

Generally speaking the worst case seems to be present in smaller Asian nations,
which have really extensive use of non-us characters. At least the French can
restrict their text to the same character set as English and still be readable,
although awkward due to the screwed up accents.

Also, smaller Asian nations still to this day have a high prevalence of
locally-grown mail clients, many of which are not even remotely RFC compliant,
but work well with others in the same locale.

They're also much more likely to make use of mixed-language text containing many
character sets. Speaking 2 or 3 different languages is fairly common in the
smaller countries of the Asian region, just due to necessity for trade with
neighboring countries.

Another area with this same basic issue would be the middle-east, but the number
of completely different character sets is smaller.




Reply via email to