Re: [whatwg] Sentence structure

2013-01-11 Thread Vipul S. Chawathe
From: Ian Hickson [mailto:i...@hixie.ch] 

On Thu, 10 Jan 2013, Thomas A. Fine wrote:
 
 Use Cases:
   4. Clarifying sentence boundaries would be an aid in machine
  translation software.

Do you have any evidence supporting this? I've spoken with engineers who
work on machine translation software and while they've certainly had
requests (whence the translate attribute), they've never asked for a way
to mark up sentences.


I'm doing some related work that requires machine translation on the lines
of export/import HTML snippets. Human language content boundaries are
directly determined by author's grammatical punctuation skills at the
sentence level. HTML is everything to-do tied-up with GUI web-browsers, so
machine translation, screen readers,  so forth are supported through other
living standards GRDDL XSLT RDFa that also work with HTML as one of
multiple possible host, however their relationship with XML serialization as
dependency for proper functioning might cause browser engine makers to
promote sticking to microdata, unless someday we get Google SilverFlash.java
Safari plug-in so that one size will fit all. As HTML is host language in
wide-spread use (my apologies for lacking statistics that I compensate by
deriving statements from common sense), perhaps this is starting point for
raising concerns that may be redirected into other specs too. It's the only
opening for those rare use cases as the story of Emperor's New Clothes.
Getting back to business, for larger content fragments there's the p
element. An immediate citation is search results cut-off abrupt fragments in
content preview. For improvising on such fragment indices they've come up
with schema.org vocab which I just had to remind here. They've got provision
to specialize from their general pre-defined types, so ThingWebPageElement
can be used to get ThingWebPageElementParagraphSentence This can be
expressed using html5 microdata itemtype attribute as:
span itemscope=itemscope
itemtype=http://www.schema.org/thing/webpage/webpageelement/paragraph/sente
nceOne whole sentence!/span
HTML5 without XML serialization will allow to skip =itemscope too! saves
12 characters, savings comparable to those recommended by minifying. :-)



Re: [whatwg] Sentence structure

2013-01-11 Thread Ian Hickson
On Sat, 12 Jan 2013, Vipul S. Chawathe wrote:
 
 I'm doing some related work that requires machine translation on the lines
 of export/import HTML snippets. Human language content boundaries are
 directly determined by author's grammatical punctuation skills at the
 sentence level.

Sure, but if the author isn't competent enough to use punctuation, I think 
we're probably not going to be able to rely on them using sentence 
correctly either, at the end of the day.


 HTML is everything to-do tied-up with GUI web-browsers, so machine 
 translation, screen readers,  so forth are supported through other 
 living standards GRDDL XSLT RDFa that also work with HTML as one of 
 multiple possible host, however their relationship with XML 
 serialization as dependency for proper functioning might cause browser 
 engine makers to promote sticking to microdata, unless someday we get 
 Google SilverFlash.java Safari plug-in so that one size will fit all. As 
 HTML is host language in wide-spread use (my apologies for lacking 
 statistics that I compensate by deriving statements from common sense), 
 perhaps this is starting point for raising concerns that may be 
 redirected into other specs too. It's the only opening for those rare 
 use cases as the story of Emperor's New Clothes.
 Getting back to business, for larger content fragments there's the p 
 element. An immediate citation is search results cut-off abrupt 
 fragments in content preview. For improvising on such fragment indices 
 they've come up with schema.org vocab which I just had to remind here. 
 They've got provision to specialize from their general pre-defined 
 types, so ThingWebPageElement can be used to get 
 ThingWebPageElementParagraphSentence This can be expressed using 
 html5 microdata itemtype attribute as: span itemscope=itemscope 
 itemtype=http://www.schema.org/thing/webpage/webpageelement/paragraph/sente 
 nceOne whole sentence!/span HTML5 without XML serialization will 
 allow to skip =itemscope too! saves 12 characters, savings comparable 
 to those recommended by minifying. :-)

I'm sorry, but I've no idea what you're saying here.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Sentence structure

2013-01-10 Thread Ian Hickson
On Thu, 10 Jan 2013, Thomas A. Fine wrote:
 
 Use Cases:
   1. Formatting sentence spacing to approximate the look of
  almost all books in English from 1650-1950.

This is possible today, using span class=sentence. Unless 
approximating the formatting of a small minority of old books becomes much 
more common than it is now, this use case probably doesn't justify using a 
dedicated element.


   2. Formatting sentence spacing because it is very likely an
  aid to scanning text, and there are some indications that it
  is helpful for new readers, readers learning a new language,
  and readers with visual scanning issues and other learning
  disabilities.

Browsers can do this without markup (sentences are detectable by some 
relatively simple heuristics), so this wouldn't justify adding a 
markup-level feature.

Incidentally, do you have any research to support this claim? My 
understanding is that in practice the double-spacing at the end of 
sentences is considered an antiquated practice that doesn't actually help 
with reading much, certainly not as much as slightly increased line 
spacing, clear punctuation, and the like.


   3. Formatting sentence spacing because I like it that way.

This is possible today, using span class=sentence. Unless your 
preference here becomes much more common than it is now, this use case 
probably doesn't justify using a dedicated element.


   4. Clarifying sentence boundaries would be an aid in machine
  translation software.

Do you have any evidence supporting this? I've spoken with engineers who 
work on machine translation software and while they've certainly had 
requests (whence the translate attribute), they've never asked for a way 
to mark up sentences.


   5. Clarifying sentence boundaries would be an aid to screen
  readers to help provide correct inflection.

Screen readers must have excellent sentence ending detections regardless 
of what features we provide, because most Web pages (and there are 
trillions already) don't include such markup. So adding an element would 
not solve this problem.


Since the use cases do not currently support adding an element for this 
purpose, I have not added the element to the language.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Sentence structure

2013-01-10 Thread Thomas A. Fine

I guess I was just way too long-winded.

Buried in there were some good ideas, and I'm no longer strictly 
advocating just a sentence tag.  I read more about how things are 
supposed to work, and I focused on what is needed in general terms, and 
then as many different possible solutions and their pros and cons.


I still think a sentence tag is a good idea, but I would now really 
favor an approach that allows CSS to interpret a pair of spaces 
following terminal punctuation directly as a sentence break, and then 
provide a mechanism to format that directly.  If I had to narrow things 
down to just one choice rather than a spectrum of available approaches 
it would be that one.


It's practical for content developers, straightforward to implement, can 
be easily applied to previously generated content, and does not ugly 
up the HTML (in fact the HTML wouldn't even change at all, only a tiny 
bit of CSS would be added).  It's not ideal for semantic sentence 
detection, but is at least a significant improvement there.


 tom


Re: [whatwg] Sentence structure

2013-01-10 Thread Ian Hickson
On Thu, 10 Jan 2013, Thomas A. Fine wrote:

 I guess I was just way too long-winded.
 
 Buried in there were some good ideas, and I'm no longer strictly 
 advocating just a sentence tag.  I read more about how things are 
 supposed to work, and I focused on what is needed in general terms, and 
 then as many different possible solutions and their pros and cons.
 
 I still think a sentence tag is a good idea, but I would now really 
 favor an approach that allows CSS to interpret a pair of spaces 
 following terminal punctuation directly as a sentence break, and then 
 provide a mechanism to format that directly.  If I had to narrow things 
 down to just one choice rather than a spectrum of available approaches 
 it would be that one.
 
 It's practical for content developers, straightforward to implement, can 
 be easily applied to previously generated content, and does not ugly 
 up the HTML (in fact the HTML wouldn't even change at all, only a tiny 
 bit of CSS would be added).  It's not ideal for semantic sentence 
 detection, but is at least a significant improvement there.

I don't know if the use cases justify adding a feature to CSS, but I'll 
let the CSS editors and browser vendors be the judges of that. :-)

The CSS spec is discussed on the www-st...@w3.org list.

HTH,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Sentence structure

2013-01-10 Thread Thomas A. Fine

On 1/10/13 11:36 PM, Ian Hickson wrote:

I don't know if the use cases justify adding a feature to CSS, but I'll
let the CSS editors and browser vendors be the judges of that. :-)

The CSS spec is discussed on the www-st...@w3.org list.


Sorry then, I was under the impression that WHATWG covered a broader 
spectrum than just the HTML piece.


 tom



Re: [whatwg] Sentence structure

2013-01-10 Thread Ian Hickson
On Thu, 10 Jan 2013, Thomas A. Fine wrote:
 On 1/10/13 11:36 PM, Ian Hickson wrote:
  I don't know if the use cases justify adding a feature to CSS, but 
  I'll let the CSS editors and browser vendors be the judges of that. 
  :-)
  
  The CSS spec is discussed on the www-st...@w3.org list.
 
 Sorry then, I was under the impression that WHATWG covered a broader 
 spectrum than just the HTML piece.

We currently cover the following specs:

   http://whatwg.org/specs

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'