Re: [whatwg] Default encoding to UTF-8?

2011-12-05 Thread Sergiusz Wolicki
> (And HTML5 defines it the same.)

No. As far as I understand, HTML5 defines US-ASCII to be the default and
requires that any other encoding is explicitly declared. I do like this
approach.

We should also lobby for authoring tools (as recommended by HTML5) to
default their output to UTF-8 and make sure the encoding is declared.  As
so many pages, supposedly (I have not researched this), use the incorrect
encoding, it makes no sense to try to clean this mess by messing with
existing defaults. It may fix some pages and break others. Browsers have
the ability to override an incorrect encoding and this a reasonable
workaround.


-- Sergiusz


On Mon, Dec 5, 2011 at 6:42 PM, Leif Halvard Silli <
xn--mlform-...@xn--mlform-iua.no> wrote:

> L. David Baron on Wed Nov 30 18:29:31 PST 2011:
> > On Wednesday 2011-11-30 15:28 -0800, Faruk Ates wrote:
> >> My understanding is that all browsers* default to Western Latin
> >> (ISO-8859-1) encoding by default (for Western-world
> >> downloads/OSes) due to legacy content on the web. But how relevant
> >> is that still today? Has any browser done any recent research into
> >> the need for this?
> >
> > The default varies by localization (and within that potentially by
> > platform), and unfortunately that variation does matter.  You can
> > see Firefox's defaults here:
> >
> http://mxr.mozilla.org/l10n-mozilla-beta/search?string=intl.charset.default
> > (The localization and platform are part of the filename.)
>
> Last I checked, some of those locales defaulted to UTF-8. (And HTML5
> defines it the same.) So how is that possible? Don't users of those
> locales travel as much as you do? Or do we consider the English locale
> user's as more important? Something is broken in the logics here!
>
> > I changed my Firefox from the ISO-8859-1 default to UTF-8 years ago
> > (by changing the "intl.charset.default" preference), and I do see a
> > decent amount of broken content as a result (maybe I encounter a new
> > broken page once a week? -- though substantially more often if I'm
> > looking at non-English pages because of travel).
>
> What kind of trouble are you actually describing here? You are
> describing a problem with using UTF-8 for *your locale*. What is your
> locale? It is probably English. Or do you consider your locale to be
> 'the Western world locale'? It sounds like *that* is what Anne has in
> mind when he brings in Dutch:
> http://blog.whatwg.org/weekly-encoding-woes (Quite often it sounds as
> if some see Latin-1 - or Windows-1251 as we now should say - as a
> 'super default' rather than a locale default. If that is the case, that
> it is a super default, then we should also spec it like that! Until
> further, I'll treat Latin-1 as it is specced: As a default for certain
> locales.)
>
> Since it is a locale problem, we need to understand which locale you
> have - and/or which locale you - and other debaters - think they have.
> Faruk probably uses a Spanish locale - right?, so the two of you are
> not speaking out of the same context.
>
> However, you also say that your problem is not so much related to pages
> written for *your* locale as it is related for pages written for users
> of *other* locales. So how many times per year do Dutch, Spanish or
> Norwegian  - and other non-English pages - are creating troubles for
> you, as a English locale user? I am making an assumption: Almost never.
> You don't read those languages, do you?
>
> This is also an expectation thing: If you visit a Russian page in a
> legacy Cyrillic encoding, and gets mojibake because your browser
> defaults to Latin-1, then what does it matter to you whether your
> browser defaults to Latin-1 or UTF-8? Answer: Nothing.
>
> >> I'm wondering if it might not be good to start encouraging
> >> defaulting to UTF-8, and only fallback to Western Latin if it is
> >> detected that the content is very old / served by old
> >> infrastructure or servers, etc. And of course if the content is
> >> served with an explicit encoding of Western Latin.
> >
> > The more complex the rules, the harder they are for authors to
> > understand / debug.  I wouldn't want to create rules like those.
>
> Agree that that particular idea is probably not the best.
>
> > I would, however, like to see movement towards defaulting to UTF-8:
> > the current situation makes the Web less world-wide because pages
> > that work for one user don't work for another.
> >
> > I'm just not quite sure how to get from here to there, though, since
> > such changes are likely to make users experience broken content.
>
> I think we should 'attack' the dominating locale first: The English
> locale, in its different incarnations (Australian, American, UK). Thus,
> we should turn things on the head: English users should start to expect
> UTF-8 to be used. Because, as English users, you are more used to
> 'mojibake' than the rest of us are: Whenever you see it, you 'know'
> that it is because it is a foreign language you are reading. It is we,

Re: [whatwg] Proposal in supporting the writing of "Arabizi"

2011-12-01 Thread Sergiusz Wolicki
What you are proposing is not a HTML feature but an O/S- or
browser-specific functionality equivalent to East Asian IMEs.  East Asian
IMEs (input method editors = complex keyboard processors), which often use
a similar phonetic method to enter ideographic or syllabic characters, can
be activated by a keyboard sequence in any text field, not only INPUT
fields.

It would be nice to have the mentioned functionality in each browsers, but
each Chinese user would like something like this for Chinese as well. As a
Polish, I would like a way to insert Polish characters easily in any
browser as well. The proposed feature is therefore not a matter of HTML or
even browsers but rather of the operating systems.

Also, the lang= attribute is already well defined for any HTML element.
You cannot specify lang="arabizi" because there is no such language in the
relevant ISO standard (ISO 639).  "Arabizi" is a pseudo-script but it is
also not in the relevant ISO standard (ISO 15924). Therefore, "ar-Arabizi"
is also illegal. You could theoretically say lang="x-arabizi" (private tag)
but then you would not be able to properly specify the actual language, for
example for a spellchecker.


Thanks
Sergiusz



On Thu, Dec 1, 2011 at 10:07 AM, Sami Eljabali  wrote:

> Hello, I apologize if this an incorrect forum to propose new html features
> in which case you may disregard this email, however should you know a more
> appropriate forum then please let me know, else I ask you to please
> entertain
> this email. :)
>
>
> There's a need for phonetic based keyboard support for Arabic speaking
> users on today's internet. There are two primary reasons for this:
>
> 1) Many Arabic speaking users don't surf in Arabic. A good portion of them
> are in non-arabic speaking countries, hence more often than not have
> non-arabic keyboards therefore finding it difficult to write Arabic on the
> internet. There are on the contrary, virtual Arabic keyboards on the OS
> level, as well as on sites like Google  addressing
> this, however phonetically spelling out a word, and seeing a list of words
> containing the one you were trying to spell out is dramatically more
> effective than the counterpart.
>
> 2) It vastly aids those with lacking a thorough Arabic education to
> properly to spell out what they phonetically know, hence allows a greater
> audience including non-natives to write in Arabic.
>
> *
> *
>
> *Proposal:*
>
> Have the interpreter described above be embedded within browsers and
> enabled when users click and focus on text fields defined as:  type="text" lang="arabizi"> to interpret
> Arabizias Arabic.
> Should a browser not support it, then the  would be the
> fallback attribute leaving users writing in a plain text field.
>
> *
> *
>
> *Advantages of a Browser Implementation*
>
> 1) Guaranteed availability and ease of use for users continually relying on
> this feature, opposed to using third party service
> or installed software.
>
> 2) Exposure to the majority of users in need of this capability.
>
>
> Furthermore, we believe the "lang" attribute opens doors in supporting
> other languages. Even showing a virtual keyboard for most spoken languages,
> and its variations, would ultimately ensure the ability everyone to express
> themselves in their language(s) of choosing on the internet.
>
>
> Your feedback is more than appreciated.
>
> Thank you for your time,
>
> Sami Eljabali
>
> Daniel Bates
>


Re: [whatwg] Default encoding to UTF-8?

2011-12-01 Thread Sergiusz Wolicki
I have read section 4.2.5.5 of the WHATWG HTML spec and I think it is
sufficient.  It requires that any non-US-ASCII document has an explicit
character encoding declaration. It also recommends UTF-8 for all new
documents and for authoring tools' default encoding.  Therefore, any
document conforming to HTML5 should not pose any problem in this area.

The default encoding issue is therefore for old stuff.  But I have seen a
lot of pages, in browsers and in mail, that were tagged with one encoding
and encoded in another.  Hence, documents without a charset declaration are
only one of the reasons of garbage we see. Therefore, I see no point in
trying to fix anything in browsers by changing the ancient defaults
(risking compatibility issues). Energy should go into filing bugs against
misbehaving authoring tools and into adding proper recommendations and
education in HTML guidelines and tutorials.


Thanks,
Sergiusz


On Thu, Dec 1, 2011 at 7:00 AM, L. David Baron  wrote:

> On Thursday 2011-12-01 14:37 +0900, Mark Callow wrote:
> > On 01/12/2011 11:29, L. David Baron wrote:
> > > The default varies by localization (and within that potentially by
> > > platform), and unfortunately that variation does matter.
> > In my experience this is what causes most of the breakage. It leads
> > people to create pages that do not specify the charset encoding. The
> > page works fine in the creator's locale but shows mojibake (garbage
> > characters) for anyone in a different locale.
> >
> > If the default was ASCII everywhere then all authors would see mojibake,
> > unless it really was an ASCII-only page, which would force them to set
> > the charset encoding correctly.
>
> Sure, if the default were consistent everywhere we'd be fine.  If we
> have a choice in what that default is, UTF-8 is probably a good
> choice unless there's some advantage to another one.  But nobody's
> figured out how to get from here to there.
>
> (I think this is legacy from the pre-Unicode days, when the browser
> simply displayed Web pages using to the system character set, which
> led to a legacy of incompatible Web pages in different parts of the
> world.)
>
> -David
>
> --
> 𝄞   L. David Baron http://dbaron.org/   𝄂
> 𝄢   Mozilla   http://www.mozilla.org/   𝄂
>


Re: [whatwg] Editorial comment r/e summary element

2011-09-20 Thread Sergiusz Wolicki
I am reading:

"Contexts in which this element can be used:
As the first child of a
detailselement."
My feeling is that unconnected DOM elements in a script are not really an
HTML document but only its building blocks (bricks). Therefore, any
parent-child relationship required by the spec does not apply until the
fragments are connected together to form an HTML document to be interpreted
(rendered) by a user agent. Therefore, if "if any" applies to fragments only
and not to complete documents, then I feel, it should not be present in the
spec.

The problem is that if we add "if any", allowing no parent, then we should
also define what  means if there is no parent.

In short: "if any" should not be added if it is only meant to allow an
element to be represented separately as DOM in a script, because, if I
understand correctly, such representation is allowed for any HTML element.


-- S5sz




On Tue, Sep 20, 2011 at 10:10 PM, Ian Hickson  wrote:

> On Tue, 20 Sep 2011, Bruce Lawson wrote:
> >
> > Fair dames and damsels of the list
> >
> > Consider
> >
> http://www.whatwg.org/specs/web-apps/current-work/multipage/interactive-elements.html#the-summary-element
> :
> > "The summary element represents a summary, caption, or legend for the
> rest of
> > the contents of the summary element's parent details element, if any."
> >
> > I read "if any" to mean there may or may not be "a summary, caption or
> > legend".
> >
> > However, a questioner to HTML5 Doctor believes that  can be used
> > outside , reading "if any" to sugest that there may not be a
> > "summary element's parent details element".
> >
> > (She wants to use  at the top of an article to summarise its
> > contents, because the ambiguous prose I quote suggests that a parent
> details
> > element is optional).
>
> It means that there might not be a  parent. The only way this
> could happen in a conformance situation is if the  didn't have a
> parent at all, which is only possible in unconnected DOM fragments in
> script.
>
>
> > Can we remove this ambiguity? "The summary element represents an
> > optional summary, caption, or legend for the rest of the contents of the
> > summary element's parent details element" would work.
>
> The summary isn't optional ( is a required child of ).
>
> The "if any" style is used all over the spec; I'm not sure how to make it
> clearer without dramatically increasing the verbosity, which I would like
> to do to avoid drawing attention to aspects of the spec that are of
> relatively little practical importance. For example, replacing it with "if
> the element has such a parent" changes this minor point from a two-word
> side note to a whole sentence fragment taking a quarter of the sentence.
>
> Anyone have any suggestions?
>
> --
> Ian Hickson   U+1047E)\._.,--,'``.fL
> http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>


Re: [whatwg] and elements

2011-09-05 Thread Sergiusz Wolicki
Hi Everybody,

If  has a compatibility issue, then we could use .
It would be slightly shorter than  and it would be more
appropriate to name a section with "comments".  would mean a
single comment and I understand this is not the purpose of the
proposed tag.

If the ability to hide a section with certain semantics is an argument
for adding a tag, then  may make sense. I often read news
portals, where many readers' comments are really stupid and annoying
but I still start reading them. An ability to hide the comments
sections from myself would be a great time saver ;)

Thanks,

Sergiusz