The more I think about this the more I believe that the correct choise would be to describe the expected content more accurately. The UA may then proceed to accurately turn spellchecking on or off. The problem is that the lang attribute allows only stuff defined in RFC 3066, which seems to support only ISO 639 defined language tags. That is, the expressable languages are limited to *spoken* languages.

Ian Hickson wrote:
On Sun, 11 Jun 2006, Alexey Feldgendler wrote:
> Information like "this input field should have autoindent" is > presentational.

Yeah, but you'd have to say "auto-indent this like C++", which isn't. IMHO.

Perhaps instead of using |spellcheck| attribute as a toggle, allow white space separated list of expected input languages. If user is expected to enter C++ code with English comments, then author should use markup such as

<textarea lang="zzz" spellcheck="c++ en">

for "no linguistic content" with spell checking for c++ and English.

An another option would be to expand the lang attribute to allow languages outside human languages. This has the added bonus that the lang attribute could describe also other content more accurately. RFC 3066 reserves language codes starting with "x-" for private use and that could be used to aid spellchecking, too. Unfortunately only A-Z,0-9 are allowed so perhaps something like

<textarea lang="x-cpp-en">

for private language cpp-en or "C++ with English comments". Or if lang attribute is extended to allow multiple languages listed then one could write

<textarea lang="en x-cpp">

for English text mixed with C++ code (which is less accurate than the x-cpp-en above).

The GMail "To:" input field could be expressed as

<textarea lang="x-mail-to">

and UAs that don't regognize language "x-mail-to" should turn off the spellchecking.

A typical blog input field could be encoded as

<textarea lang="x-html-fragment-en">

Here one sees more need for multiple language tags inside the "lang" attribute. It would make more sense to use lang="x-html-fragment en" or there would be need for *very* many private languages starting with "x-html-fragment-" including "x-html-fragment-sv-fi".

On Fri, 23 Jun 2006, Sander Tekelenburg wrote:
        [AUTHOR REQUIREMENTS]

Authors should set the document's language information, to enable user agents to accurately determine which dictionary to use when checking the spelling or grammar of user input.
IMO this "should" should be a "must".

What about if the author doesn't know the language?

ISO 639 Part 2 includes "und" for "undetermined language". A sane default for UA is to disable the spell checking. Or use some unknown heuristic to define the language itself.

On Sat, 24 Jun 2006, Alexey Feldgendler wrote:
Even worse: when entering text in textarea, the user actually has a choice which language to write in. I think the user agent should provide, besides just the control to turn spellchecking on and off, a choice of languages.

Agreed.

If a form expects some English text to be entered, it would be wise to mark text written with any other language as incorrectly spelled. If author expects any language then he should specify lang="mul" for "multiple languages" (again, defined by ISO 639 part 2).

Again, a list of acceptable languages would be nice here.

--
Mikko

Reply via email to