Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-09-18 Thread L. David Baron
On Wednesday 2013-05-01 01:01 -0700, Elliott Sprehn wrote:
 On Wed, Apr 24, 2013 at 9:22 AM, Peter Occil pocci...@gmail.com wrote:
  I have no objection to the name baseLang rather than language as the
  name of the DOM attribute.
 
  But if there isn't more interest or you decide not to add this DOM
  attribute, I encourage you to at least:
 
 
 fwiw WebKit (and Blink) implement this through CSS inheritance since you
 need to know the lang for all kinds of things and walking up the DOM
 repeatedly would be expensive.
 
 -webkit-locale is inherited by default and contains the enclosing @lang
 value. You can query it through getComputedStyle(node).webkitLocale. That
 doesn't help your custom parser though.

In Gecko it's also implemented through CSS inheritance, but it's not
exposed to Web content as a CSS property.  (Internally it's
'-x-lang', but that name isn't exposed.)

We use the language for:
 * font selection
 * language-specific text-transform behavior
 * hyphenation (which doesn't work unless it's explicitly specified,
   as required by http://dev.w3.org/csswg/css-text/#hyphens-property )

-David

-- 
턞   L. David Baron http://dbaron.org/   턂
턢   Mozilla  https://www.mozilla.org/   턂
 Before I built a wall I'd ask to know
 What I was walling in or walling out,
 And to whom I was like to give offense.
   - Robert Frost, Mending Wall (1914)


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-09-18 Thread Anne van Kesteren
On Wed, Sep 18, 2013 at 11:10 AM, L. David Baron dba...@dbaron.org wrote:
 In Gecko it's also implemented through CSS inheritance, but it's not
 exposed to Web content as a CSS property.  (Internally it's
 '-x-lang', but that name isn't exposed.)

 We use the language for:
  * font selection
  * language-specific text-transform behavior
  * hyphenation (which doesn't work unless it's explicitly specified,
as required by http://dev.w3.org/csswg/css-text/#hyphens-property )

It seems my earlier point about inheritance of text direction remains.
Base URLs however are obsolete as only Gecko implements xml:base.

If this is implemented through CSS, does it make sense to expose it
through the DOM?


-- 
http://annevankesteren.nl/


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-09-16 Thread Ian Hickson
On Fri, 2 Aug 2013, Jukka K. Korpela wrote:
 2013-08-02 2:43, Ryosuke Niwa wrote:
   
   Are you saying that for HTML contenteditable-based editors that want 
   to support drag-and-drop editing, they need to be able to annotate 
   the outgoing HTML fragment with the effective language so that when 
   it's embedded somewhere, the right fonts get used?
  
  Yes, but not just for drag and drop.
 
 This would mean that the editor would have to guess the language from 
 the text or ask the user to specify it.

Well presumably just getting the language out of the document would be a 
good first step.


 This is not as unrealistic as it may first seem. Microsoft Word does 
 such things [...]

Sure, there's plenty of examples of language identification.


 But regarding the effect of language markup on fonts, the effect is 
 limited to situations where the font is not specified in a style sheet.

Yes. That's a case we should probably make sure we handle.


 So it could be added, well, just because there is no good reason not to.

There's always reasons not to add something:

   http://wiki.whatwg.org/wiki/FAQ#Where.27s_the_harm_in_adding.E2.80.94

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-09-16 Thread Ian Hickson
On Thu, 1 Aug 2013, Ryosuke Niwa wrote:
  
  Are you saying that for HTML contenteditable-based editors that want 
  to support drag-and-drop editing, they need to be able to annotate the 
  outgoing HTML fragment with the effective language so that when it's 
  embedded somewhere, the right fonts get used?
 
 Yes, but not just for drag and drop.

Sure, also for copy-and-paste, etc. The point is that browsers need to 
provide this for at least some of the cases.


  This seems like something that browsers should just do automatically 
  for copy-and-paste and drag-and-drop, I wouldn't want to require that 
  every contenteditable-based editor have to reimplement this. That 
  seems like a lot of redundant work, and in particular, seems to be 
  work that most editor implementors would forget. If the browsers just 
  did the annotation automatically, then this would work even in editors 
  whose implementors didn't worry about i18n.
 
 How are browsers supposed to do this if the author was simply using 
 innerHTML?

How would an author use innerHTML here?


I agree that there's a use case for providing language information, my 
point is just that this use case seems to need more than just that: it 
also needs that browsers add language annotations during drag-and-drop and 
copy-and-paste, at least. Is there anything else that it needs?

Will putting language information on drag-and-drop or copy/paste content 
have any Web compat impact?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-09-16 Thread Peter Occil

Apparently, the use cases I mentioned before have not been discussed yet:


- Localization of form controls in languages where browser support
is lacking, such as some minor languages.
- Localization of HTML elements, especially date formatting of span
 and div elements in the page's default language, see especially [1].

You said these use cases were valid; how do you think so?

--Peter

[1]: 
http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#input-impl-notes


-Original Message- 
From: Ian Hickson

Sent: Monday, September 16, 2013 6:05 PM
To: Jukka K. Korpela
Cc: whatwg@lists.whatwg.org
Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a 
node


On Fri, 2 Aug 2013, Jukka K. Korpela wrote:

2013-08-02 2:43, Ryosuke Niwa wrote:
 
  Are you saying that for HTML contenteditable-based editors that want
  to support drag-and-drop editing, they need to be able to annotate
  the outgoing HTML fragment with the effective language so that when
  it's embedded somewhere, the right fonts get used?

 Yes, but not just for drag and drop.

This would mean that the editor would have to guess the language from
the text or ask the user to specify it.


Well presumably just getting the language out of the document would be a
good first step.



This is not as unrealistic as it may first seem. Microsoft Word does
such things [...]


Sure, there's plenty of examples of language identification.



But regarding the effect of language markup on fonts, the effect is
limited to situations where the font is not specified in a style sheet.


Yes. That's a case we should probably make sure we handle.



So it could be added, well, just because there is no good reason not to.


There's always reasons not to add something:

  http://wiki.whatwg.org/wiki/FAQ#Where.27s_the_harm_in_adding.E2.80.94

--
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.' 



Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-09-16 Thread Ian Hickson
On Mon, 16 Sep 2013, Peter Occil wrote:
 
 - Localization of form controls in languages where browser support is 
 lacking, such as some minor languages.

 - Localization of HTML elements, especially date formatting of span and 
 div elements in the page's default language [...]

 You said these use cases were valid; how do you think so?

I mean these are things that users want and that authors have to do.

Your e-mail is still on my list of e-mails to deal with. (Specifically, 
it's in the pile of e-mail relating to new features.) It hasn't been 
forgotten, don't worry. :-)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-08-12 Thread Ryosuke Niwa

On Aug 8, 2013, at 7:29 AM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:

 2013-08-08 2:57, Ryosuke Niwa wrote:
 
 On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela jkorp...@cs.tut.fi
 wrote:
 [...]
 But regarding the effect of language markup on fonts, the effect is
 limited to situations where the font is not specified in a style
 sheet. This is a rather uncommon scenario these days; authors are
 more than eager to set fonts.
 
 Do you have actual statistics to support this point?
 
 No, it’s just an impression from looking at numerous pages and their coding 
 as well as views presented in authors’ forums.
 
 As far as I
 checked, neither baidu.com nor yahoo.com.tw seems to explicitly
 specify a Chinese font.
 
 They both have font-family settings, slightly different, but basically the 
 most common (sorry, no statistic on this either) setup that uses Arial 
 (possibly with Helvetica as second option, which does not change much). So, 
 granted, they don’t specify a Chinese font in the sense of including any 
 specific fonts containing CJK characters in the font-family list.
 
 Baidu doesn’t set lang either, so they seem to be accepting, for any 
 characters not covered by Arial, whatever happens to be in each browser’s 
 list of fallback fonts, when no information about content language is 
 available. Yahoo.com.tw sets lang=zh-tw, so they do care, but only to the 
 extent that the fallback font should be one intended for Traditional Chinese.
 
 So the lang markup may affect fonts, but only under some conditions. And if 
 you care about fonts, as an author, then an explicit list of font 
 alternatives has better chances of creating the desired result.

That's not a practical solution because we can't possibly know the list of 
Chinese  Japanese fonts available by default in all operating systems.

 It is true that they might specify a font list where none of the
 fonts supports some characters that might be entered, and then a
 fallback font would be used. However, using “annotations”
 (presumably, lang attributes, along with extra span elements when
 needed) does not sound like a feasible approach to this.
 
 Whether it’s feasible or not, that’s what we have been doing due to
 the Han unification.  If we could, we’ll undo the Han unification and
 use different glyphs for each character but we can’t do that at this
 point in time.
 
 If a page contains texts to be rendered using different forms (Traditional 
 Chinese, Simplified Chinese, Japanese, Korean) for Han characters, you will 
 need to control the rendering somehow. Using lang markup might be 
 theoretically most adequate, but it’s indirect and less effective than just 
 setting different fonts (via font-family lists that contain reasonably many 
 alternatives).

Controlling the rendering isn't the goal here.  The point is to use the correct 
glyph in each language so that each character is recognizable by users.  Again, 
specifying a font name is not a practical solution as authors have no way of 
knowing the list of Chinese  Japanese fonts provided by all current and future 
operating systems.

 But even if lang attributes are used, I don’t think the issue has much 
 relevance to the original question here. A DOM attribute that returns the 
 language of a node would be useful for the purpose only if you intend to 
 affect rendering via JavaScript.

No.  The point is that any code that attempts to move or copy contents must 
preserve the effective value of the lang attribute.

- R. Niwa



Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-08-08 Thread Jukka K. Korpela

2013-08-08 2:57, Ryosuke Niwa wrote:


On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela jkorp...@cs.tut.fi
wrote:

[...]

But regarding the effect of language markup on fonts, the effect is
limited to situations where the font is not specified in a style
sheet. This is a rather uncommon scenario these days; authors are
more than eager to set fonts.


Do you have actual statistics to support this point?


No, it’s just an impression from looking at numerous pages and their 
coding as well as views presented in authors’ forums.



As far as I
checked, neither baidu.com nor yahoo.com.tw seems to explicitly
specify a Chinese font.


They both have font-family settings, slightly different, but basically 
the most common (sorry, no statistic on this either) setup that uses 
Arial (possibly with Helvetica as second option, which does not change 
much). So, granted, they don’t specify a Chinese font in the sense of 
including any specific fonts containing CJK characters in the 
font-family list.


Baidu doesn’t set lang either, so they seem to be accepting, for any 
characters not covered by Arial, whatever happens to be in each 
browser’s list of fallback fonts, when no information about content 
language is available. Yahoo.com.tw sets lang=zh-tw, so they do care, 
but only to the extent that the fallback font should be one intended for 
Traditional Chinese.


So the lang markup may affect fonts, but only under some conditions. And 
if you care about fonts, as an author, then an explicit list of font 
alternatives has better chances of creating the desired result.



It is true that they might specify a font list where none of the
fonts supports some characters that might be entered, and then a
fallback font would be used. However, using “annotations”
(presumably, lang attributes, along with extra span elements when
needed) does not sound like a feasible approach to this.


Whether it’s feasible or not, that’s what we have been doing due to
the Han unification.  If we could, we’ll undo the Han unification and
use different glyphs for each character but we can’t do that at this
point in time.


If a page contains texts to be rendered using different forms 
(Traditional Chinese, Simplified Chinese, Japanese, Korean) for Han 
characters, you will need to control the rendering somehow. Using lang 
markup might be theoretically most adequate, but it’s indirect and less 
effective than just setting different fonts (via font-family lists that 
contain reasonably many alternatives).


But even if lang attributes are used, I don’t think the issue has much 
relevance to the original question here. A DOM attribute that returns 
the language of a node would be useful for the purpose only if you 
intend to affect rendering via JavaScript.


Yucca





Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-08-07 Thread Ryosuke Niwa

On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:

 2013-08-02 2:43, Ryosuke Niwa wrote:
 
 Are you saying that for HTML contenteditable-based editors that want to
 support drag-and-drop editing, they need to be able to annotate the
 outgoing HTML fragment with the effective language so that when it's
 embedded somewhere, the right fonts get used?
 
 Yes, but not just for drag and drop.
 
 This would mean that the editor would have to guess the language from the 
 text or ask the user to specify it. This is not as unrealistic as it may 
 first seem. Microsoft Word does such things, sometimes getting things right, 
 often messing things up. It typically detects change of language too late, 
 and often infers language from keyboard settings, making it rather impossible 
 to use a multilingual keyboard easily.
 
 But regarding the effect of language markup on fonts, the effect is limited 
 to situations where the font is not specified in a style sheet. This is a 
 rather uncommon scenario these days; authors are more than eager to set fonts.

Do you have actual statistics to support this point?  As far as I checked, 
neither baidu.com nor yahoo.com.tw seems to explicitly specify a Chinese font.

Also, I have just recently experienced the font type change on Gmail when I was 
conversing with a native Chinese speaker.  Her mail client used Chinese fonts 
before Japanese fonts whereas mine had Japanese fonts before Chinese fonts.

 It is true that they might specify a font list where none of the fonts 
 supports some characters that might be entered, and then a fallback font 
 would be used. However, using “annotations” (presumably, lang attributes, 
 along with extra span elements when needed) does not sound like a feasible 
 approach to this.

Whether it’s feasible or not, that’s what we have been doing due to the Han 
unification.  If we could, we’ll undo the Han unification and use different 
glyphs for each character but we can’t do that at this point in time.

- R. Niwa



Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-08-02 Thread Jukka K. Korpela

2013-08-02 2:43, Ryosuke Niwa wrote:


Are you saying that for HTML contenteditable-based editors that want to
support drag-and-drop editing, they need to be able to annotate the
outgoing HTML fragment with the effective language so that when it's
embedded somewhere, the right fonts get used?


Yes, but not just for drag and drop.


This would mean that the editor would have to guess the language from 
the text or ask the user to specify it. This is not as unrealistic as it 
may first seem. Microsoft Word does such things, sometimes getting 
things right, often messing things up. It typically detects change of 
language too late, and often infers language from keyboard settings, 
making it rather impossible to use a multilingual keyboard easily.


But regarding the effect of language markup on fonts, the effect is 
limited to situations where the font is not specified in a style sheet. 
This is a rather uncommon scenario these days; authors are more than 
eager to set fonts. It is true that they might specify a font list where 
none of the fonts supports some characters that might be entered, and 
then a fallback font would be used. However, using “annotations” 
(presumably, lang attributes, along with extra span elements when 
needed) does not sound like a feasible approach to this.


But I guess the issue is still adding a DOM property for element nodes, 
specifying the language of the node, to the extent that it can be 
inferred from lang or xml:lang attribute or from HTTP headers (real or 
faked via meta). Although the use cases are somewhat rare and not 
particularly important, the property would be conceptually easy and 
presumably easy to implement in browsers. So it could be added, well, 
just because there is no good reason not to. It may understandably 
irritate authors who need language information that they know that the 
browser has it (it needs it to implement :lang() in CSS) but does not 
give authors access to it.


Yucca




Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-07-24 Thread Ryosuke Niwa

On Jul 16, 2013, at 11:25 AM, Ian Hickson i...@hixie.ch wrote:

 On Tue, 16 Jul 2013, Takayoshi Kochi ($B2OFb(B $BN4?N(B) wrote:
 
 IIUC WebKit uses internally node's language to determine which font to use
 to render text,
 e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification)
 WebKit has to choose
 a proper glyph depending on its lang attribute for the same Unicode
 codepoint.
 
 Sure, but internal UA uses aren't use cases for the Web.
 
 The use cases Peter gave over the weekend are valid, though.

The fact browsers use the effective language for font selection is very 
relevant in HTML editing. For example, consider the following document:

!DOCTYPE html
html lang=ja
html
body
section lang=zh
p id=source$BANV}$(D7q(B/p
/section
blockquote
p id=destination/p
/blockquote
/body
/html

If you were to get the innerHTML of #source and insert it into #destination, 
the effective language changes from Chinese and Japanese and the three 
characters transform their shapes because browsers will use different fallback 
fonts.

- R. Niwa



Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-07-16 Thread Ian Hickson
On Tue, 16 Jul 2013, Takayoshi Kochi (河内 隆仁) wrote:

 IIUC WebKit uses internally node's language to determine which font to use
 to render text,
 e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification)
 WebKit has to choose
 a proper glyph depending on its lang attribute for the same Unicode
 codepoint.

Sure, but internal UA uses aren't use cases for the Web.

The use cases Peter gave over the weekend are valid, though.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-07-15 Thread 河内 隆仁
IIUC WebKit uses internally node's language to determine which font to use
to render text,
e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification)
WebKit has to choose
a proper glyph depending on its lang attribute for the same Unicode
codepoint.

Matt (falken@) knows more about this.


On Sun, Jul 14, 2013 at 12:39 AM, Ian Hickson i...@hixie.ch wrote:

 On Fri, 12 Jul 2013, Peter Occil wrote:
 
  Well, my true hope is that such a DOM attribute like language will be
  specified in the HTML or DOM spec.  Especially since it's not currently
  possible to get the language of a node through JavaScript methods alone.

 What's the use case for having this in HTML?

 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'




-- 
Takayoshi Kochi


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-07-15 Thread Matt Falkenhagen
(resending from correct address)

On Tue, Jul 16, 2013 at 10:18 AM, Takayoshi Kochi (河内 隆仁)
ko...@google.com wrote:
 IIUC WebKit uses internally node's language to determine which font to use
 to render text,
 e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification)
 WebKit has to choose
 a proper glyph depending on its lang attribute for the same Unicode
 codepoint.

Yes, WebKit does this using -webkit-locale which Elliott mentioned
above.

Firefox and IE also use language for font selection, but I'm not
familiar with their implementation.


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-07-13 Thread Ian Hickson
On Fri, 12 Jul 2013, Peter Occil wrote:

 Well, my true hope is that such a DOM attribute like language will be 
 specified in the HTML or DOM spec.  Especially since it's not currently 
 possible to get the language of a node through JavaScript methods alone. 

What's the use case for having this in HTML?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-07-13 Thread Peter Occil

There are several use cases:

- Localization of form controls in languages where browser support
is lacking, such as some minor languages.
- Localization of HTML elements, especially date formatting of span
 and div elements in the page's default language, see especially [1].

When it comes to retrieving the language of an element, a JavaScript 
implementation can do everything except retrieve the value of the 
Content-Language header of the document, so even providing a DOM attribute 
like contentLanguage will resolve this issue.


[1]: 
http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#input-impl-notes


-Original Message- 
From: Ian Hickson

Sent: Saturday, July 13, 2013 11:39 AM
To: Peter Occil
Cc: WHATWG
Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a 
node


On Fri, 12 Jul 2013, Peter Occil wrote:


Well, my true hope is that such a DOM attribute like language will be
specified in the HTML or DOM spec.  Especially since it's not currently
possible to get the language of a node through JavaScript methods alone.


What's the use case for having this in HTML?

--
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.' 



Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-07-12 Thread Ian Hickson
On Wed, 24 Apr 2013, Peter Occil wrote:
 
 Well in my case, I have written an HTML parser in Java and C# [1][2], 
 which parses HTML documents and returns an object that implements a 
 subset of the DOM, so far.  As far as possible, I included only methods 
 and attributes that were specified in the DOM or HTML specification, 
 such as the characterSet attribute (which is called getCharacterSet on 
 my DOM's IDocument interface), and more recently the innerHTML attribute 
 (which is called getInnerHTML on my DOM's IElement interface)
 
 However, when I decided to implement an RDFa processor based on my HTML 
 parser, I had need to include a method that returns the language of a 
 node (see, for example, section 3.3 of reference [3]). As a result, I 
 included a method called getLanguage on my DOM's INode interface (which 
 may correspond to a possible--future--DOM attribute called language on 
 the Node interface).  I feel uneasy having to include this extension to 
 what ought to be a subset of the HTML DOM.

Implementations of HTML and the DOM are allowed to have internal methods 
to do things like this. There's no reason to limit yourself to the API 
visible to JavaScript.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-07-12 Thread Peter Occil
Well, my true hope is that such a DOM attribute like language will be 
specified in the HTML or DOM spec.  Especially since it's not currently 
possible to get the language of a node through JavaScript methods alone. 
See [1] and its replies.


--Peter

[1]: http://lists.w3.org/Archives/Public/public-rdfa-wg/2013May/0064.html

-Original Message- 
From: Ian Hickson

Sent: Friday, July 12, 2013 3:48 PM
To: Peter Occil
Cc: WHATWG
Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a 
node


On Wed, 24 Apr 2013, Peter Occil wrote:


Well in my case, I have written an HTML parser in Java and C# [1][2],
which parses HTML documents and returns an object that implements a
subset of the DOM, so far.  As far as possible, I included only methods
and attributes that were specified in the DOM or HTML specification,
such as the characterSet attribute (which is called getCharacterSet on
my DOM's IDocument interface), and more recently the innerHTML attribute
(which is called getInnerHTML on my DOM's IElement interface)

However, when I decided to implement an RDFa processor based on my HTML
parser, I had need to include a method that returns the language of a
node (see, for example, section 3.3 of reference [3]). As a result, I
included a method called getLanguage on my DOM's INode interface (which
may correspond to a possible--future--DOM attribute called language on
the Node interface).  I feel uneasy having to include this extension to
what ought to be a subset of the HTML DOM.


Implementations of HTML and the DOM are allowed to have internal methods
to do things like this. There's no reason to limit yourself to the API
visible to JavaScript.

--
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.' 



Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-05-01 Thread Elliott Sprehn
On Wed, Apr 24, 2013 at 9:22 AM, Peter Occil pocci...@gmail.com wrote:


 I have no objection to the name baseLang rather than language as the
 name of the DOM attribute.

 But if there isn't more interest or you decide not to add this DOM
 attribute, I encourage you to at least:


fwiw WebKit (and Blink) implement this through CSS inheritance since you
need to know the lang for all kinds of things and walking up the DOM
repeatedly would be expensive.

-webkit-locale is inherited by default and contains the enclosing @lang
value. You can query it through getComputedStyle(node).webkitLocale. That
doesn't help your custom parser though.

- E


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-05-01 Thread Elliott Sprehn
On Wed, May 1, 2013 at 1:49 AM, Robert O'Callahan rob...@ocallahan.orgwrote:

 On Wed, May 1, 2013 at 8:01 PM, Elliott Sprehn espr...@chromium.orgwrote:

 fwiw WebKit (and Blink) implement this through CSS inheritance since you
 need to know the lang for all kinds of things and walking up the DOM
 repeatedly would be expensive.

 -webkit-locale is inherited by default and contains the enclosing @lang
 value. You can query it through getComputedStyle(node).webkitLocale. That
 doesn't help your custom parser though.


 Interesting. What does body:lang(en) { -webkit-locale:fr; } do? :-)


Nothing sensible. :) We still walk up the tree when doing selector
matching, but you'll confuse some render tree operations. Looking through
the code again we don't use -webkit-locale as much as I thought, though
perhaps we should and prevent user CSS from tampering with it.

- E


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-05-01 Thread Anne van Kesteren
On Wed, May 1, 2013 at 9:01 AM, Elliott Sprehn espr...@chromium.org wrote:
 fwiw WebKit (and Blink) implement this through CSS inheritance since you
 need to know the lang for all kinds of things and walking up the DOM
 repeatedly would be expensive.

 -webkit-locale is inherited by default and contains the enclosing @lang
 value. You can query it through getComputedStyle(node).webkitLocale. That
 doesn't help your custom parser though.

What mechanism did you use for base URLs? Or is the cost there deemed
acceptable?

For direction you need to have a similar mechanism available. (And
you'll have the same problem roc pointed out with :lang() if you solve
that through CSS as :dir(ltr) and :dir(rtl) exist these days.)

If browsers already implement some kind of API for these things it
might be worth to expose them in the same way.


--
http://annevankesteren.nl/


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-04-24 Thread Anne van Kesteren
On Wed, Apr 24, 2013 at 6:49 AM, Peter Occil pocci...@gmail.com wrote:
 While a language attribute on Node may also be useful to
 HTML+RDFa processors in JavaScript, I have no plans to implement
 such a processor in JavaScript, though.

There's https://www.w3.org/Bugs/Public/show_bug.cgi?id=16489 fwiw.
Interest thus far seems fairly low, prolly due to what Kenny mentioned
with people not doing much language tagging at all.


--
http://annevankesteren.nl/


Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-04-24 Thread Peter Occil


I have no objection to the name baseLang rather than language as the 
name of the DOM attribute.


But if there isn't more interest or you decide not to add this DOM 
attribute, I encourage you to at least:


* define a DOM attribute to get the value of the document's Content-Language 
header (or other fallback language), or if that's not acceptable,
* change the definition of the language of an HTML node to remove the 
requirement to use the document's Content-Language header, or if that's not 
acceptable,
* change section 3.3 of HTML+RDFa to say, for example, that only information 
in the lang and xml:lang attributes are relevant for determining 
language for the purposes of HTML+RDFa.


With any of these three alternatives, I can live with doing tree traversal 
for finding the language of a node, both in my Java/C# DOM and (if I get 
around to it) in JavaScript.  Especially the last two may be viable if it 
turns out that people use the Content-Language header even less than people 
use the lang and xml:lang attributes on HTML and XHTML documents.


-

In the meantime, is it advisable for me to extend my Java/C# DOM by adding a 
getLanguage method?  (See my previous message for context.)


--Peter

-Original Message- 
From: Anne van Kesteren

Sent: Wednesday, April 24, 2013 10:03 AM
To: Peter Occil
Cc: WHATWG
Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a 
node


On Wed, Apr 24, 2013 at 6:49 AM, Peter Occil pocci...@gmail.com wrote:

While a language attribute on Node may also be useful to
HTML+RDFa processors in JavaScript, I have no plans to implement
such a processor in JavaScript, though.


There's https://www.w3.org/Bugs/Public/show_bug.cgi?id=16489 fwiw.
Interest thus far seems fairly low, prolly due to what Kenny mentioned
with people not doing much language tagging at all.


--
http://annevankesteren.nl/ 



Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-04-23 Thread Kang-Hao (Kenny) Lu
(13/04/23 16:44), Peter Occil wrote:
 I believe there should be a DOM attribute that returns the language
 of a node, as defined in section 3.2.3.3 The lang and xml:lang
 attributes.

What's your use case? If you want to style a particular language then
there's the CSS :lang() pseudo-class.

Use cases are important because otherwise I think there are very few
pages with multiple lang attributes...

 While there is a lang DOM attribute, it's inadequate because it's
 only affected by the element's lang content attribute.

That's true. However, if the case isn't important, we can do tree
traversal (modulo HTTP Content-Language header and pragma) or exhaust
element.matchesSelector(:lang(xxx)).

 Also, I don't see a way to get the language of a node otherwise,
 especially since it depends not only on lang and xml:lang, but
 also on the HTTP Content-Language header, which may not be possible
 to retrieve with existing JavaScript methods, as far as I can tell.

Indeed.


Cheers,
Kenny
-- 
Web Specialist, Opera Sphinx Game Force, Oupeng Browser, Beijing
Try Oupeng: http://www.oupeng.com/