Re: [whatwg] Hyphenation

2009-02-10 Thread Smylers
Markus Ernst writes:

> Ian Hickson schrieb:
> 
> > I don't think this is a big enough problem to deserve solutions more
> > complicated than the soft hyphen at this time.
> 
> Jukka Korpela stated that the intention of the soft hyphen is not 
> actually a hyphenation hint:
> http://www.cs.tut.fi/~jkorpela/shy.html

He claims that there are multiple standards that contradict each other.
So whatever is implemented is bound to contravene at least one of them.

However he mentions that:

* HTML 4 defines it as a hyphenation hint.

* Unicode defines it as a hyphenation hint.

* Recent browsers are now treating it as a hyphenation hint.

* The contradictory standard (ISO-8859) only defines a soft hyphen when
  used at the end of a line, namely that it should be rendered like a
  hyphen.  Since that standard doesn't envisage the character being used
  elsewhere, it is silent on how it should be rendered.

It seems to me that choosing to render invisibly a soft hyphen which
isn't at the end of a line doesn't contradict the text of ISO-8859
(though it could be argued to contradict its spirit).

> (Anyway I don't really understand the difference between a normal
> hyphen and a soft hyphen then...)

Suppose you are reflowing some text (perhaps because you are quoting
it); words which were broken over lines in the original may want
rejoining into a single word in your version (that is, the soft hyphen
disappears); but hyphens (non-soft) between two words need to remain.  

Smylers


Re: [whatwg] Hyphenation

2009-02-10 Thread Ian Hickson
On Tue, 10 Feb 2009, Markus Ernst wrote:
> > 
> > While I appreciate the problems faced by Swedish, German, and othes, I 
> > don't think this is a big enough problem to deserve solutions more 
> > complicated than the soft hyphen at this time.
> 
> Jukka Korpela stated that the intention of the soft hyphen is not 
> actually a hyphenation hint: http://www.cs.tut.fi/~jkorpela/shy.html

As far as I can tell this is a non-issue; HTML5 defers to Unicode for the 
semantics of its characters, and Unicode is clear here. HTML5 doesn't 
support ISO 8859-1 (it always treats content labeled as such as a Win1252 
mapping to Unicode).


> The wish for an in-text hyphenation mechanism is of course motivated by 
> the habit of how we do it in office and layout softwares, where text and 
> presentation are not separated. I totally agree that the appropriate 
> place for it is presentation, thus CSS, and the CSS3 draft looks quite 
> reasonable: http://www.w3.org/TR/2007/WD-css3-gcpm-20070205/#hyphenation
> 
> Anyway I don't find anything about the format of the hyphenation 
> dictionary. To replace in-text hyphenation hints it is necessary to have 
> several levels of hyphenation quality - the german word for "hyphenation 
> mechanism" for example, "Trennungsmechanismus", you might want to have 
> hyphentated at any possible place inside body text, but only at 
> "Trennungs-mechanismus" in a headline. I see that this list is not the 
> appropriate place for suggestions about CSS3 properties - maybe someone 
> can point me to the appropriate place?

www-st...@w3.org is the appropriate place. See the "Status of this 
document" section of the draft you cite above.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Hyphenation

2009-02-10 Thread Markus Ernst

Ian Hickson schrieb:

On Tue, 9 Jan 2007, �istein E. Andersen wrote:
Hyphenation does not seem to have been discussed on this list so far, 
and I think it should be.


Old proposal:
[2] http://www.nada.kth.se/i18n/html/hyph.html


While I appreciate the problems faced by Swedish, German, and othes, I 
don't think this is a big enough problem to deserve solutions more 
complicated than the soft hyphen at this time.


Jukka Korpela stated that the intention of the soft hyphen is not 
actually a hyphenation hint:

http://www.cs.tut.fi/~jkorpela/shy.html

(Anyway I don't really understand the difference between a normal hyphen 
and a soft hyphen then...)


Given that Unicode provides soft hyphen semantics and CSS provides the 
rendering rules, I don't think there is anything much for HTML5 to say on 
the matter at this time.



This thread included many further e-mails discussing the subject. I agree 
with most of the points made. There did not seem to be a consensus that 
this is something that HTML5 should do anything about. If hyphenation 
dictionaries are to be used, it seems CSS would be the best place for 
them. I haven't done anything in HTML5 to handle them.


The wish for an in-text hyphenation mechanism is of course motivated by 
the habit of how we do it in office and layout softwares, where text and 
presentation are not separated. I totally agree that the appropriate 
place for it is presentation, thus CSS, and the CSS3 draft looks quite 
reasonable:

http://www.w3.org/TR/2007/WD-css3-gcpm-20070205/#hyphenation

Anyway I don't find anything about the format of the hyphenation 
dictionary. To replace in-text hyphenation hints it is necessary to have 
several levels of hyphenation quality - the german word for "hyphenation 
mechanism" for example, "Trennungsmechanismus", you might want to have 
hyphentated at any possible place inside body text, but only at 
"Trennungs-mechanismus" in a headline. I see that this list is not the 
appropriate place for suggestions about CSS3 properties - maybe someone 
can point me to the appropriate place?


Re: [whatwg] Hyphenation

2009-02-10 Thread Ian Hickson
On Tue, 9 Jan 2007, �istein E. Andersen wrote:
>
> Hyphenation does not seem to have been discussed on this list so far, 
> and I think it should be.
> 
> Old proposal:
> [2] http://www.nada.kth.se/i18n/html/hyph.html

While I appreciate the problems faced by Swedish, German, and othes, I 
don't think this is a big enough problem to deserve solutions more 
complicated than the soft hyphen at this time.

Given that Unicode provides soft hyphen semantics and CSS provides the 
rendering rules, I don't think there is anything much for HTML5 to say on 
the matter at this time.


This thread included many further e-mails discussing the subject. I agree 
with most of the points made. There did not seem to be a consensus that 
this is something that HTML5 should do anything about. If hyphenation 
dictionaries are to be used, it seems CSS would be the best place for 
them. I haven't done anything in HTML5 to handle them.

As usual, please let me know if there is something I missed.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Hyphenation

2007-01-11 Thread Øistein E . Andersen
On 11 Jan 2007, at 5:33PM, Håkon Wium Lie wrote:

> The term "hypenation dictionary" is quite common, but I see your
> point. What would be a better name for the property?

>  hyphenation-pattern
>  hypenation-list
>  hypenation-resource

Liang's paper `Word Hy-phen-a-tion by Com-put-er', in which the concept
was first introduced, used the term `hyphenation patterns'. Unsurprisingly,
Liang's supervisor, Knuth, used the same term in the TeXbook, and this
expression seems to have become the generally accepted one amongst TeX users.

`Hyphenation dictionary' is also common, but this tends to mean something
slightly different. To exemplify, the first five lines of what I would call a
hyphenation dictionary looks like this:
> a cap·pel·la
> a for·ti·o·ri
> a go·go
> a pos·te·ri·o·ri
> a pri·o·ri

[Interestingly, this particular dictionary contains multi-word expression, but
most hyphenation engines, as well as spelling checkers, cannot take advantage of
these, as each word (according to some definition) is typically treated in 
isolation.]

In contrast, the first five hyphenation patterns in TeX82 are the following:
> .ach4
> .ad4der
> .af1t
> .al3t
> .am5at

It think it is useful to keep the distinction and would suggest to rename the
property in question `hyphenation-patterns'. (TeX's exception dictionary
falls within this narrower definition of a hyphenation dictionary.)

http://computing-dictionary.thefreedictionary.com/hyphenation says:
> HYPHENATION: Breaking words that extend beyond the right margin.
> Software hyphenates words by matching them against a hyphenation
> dictionary or by using a built-in set of rules, or both.

http://www.answers.com/topic/hyphenation-dictionary is more specific:
> HYPHENATION DICTIONARY: A word file with predefined hyphen locations.

http://www.computeruser.com/resources/dictionary/definition.html?lookup=2188
gives a more generic definition:
> A file, usually in a word processing or desktop publishing program,
> which defines where hyphens will be placed for common words.

Google returns about 21,200 results for /hyphenation dictionar(y|ies)/ and
148,100 for /hyphenation patterns?/, so the latter should also be fairly common.

To me, a `hyphenation list' suggests something rather like a hyphenation
dictionary, whereas `hyphenation resource' probably should be reserved
for a more comprehensive source of hyphenation information — unless
the same property is supposed to be able to refer to different kinds
of hyphenation data.


 [In TeX], hyphenation can [also] be indicated locally.
 This is needed in order to hyphenate words like
 rec-ord/re-cord and is the only level that deals with
 spelling changes.

> ­ is probably the best way to encode this. However, it can be done
through CSS as well:

>Dont's wait for record
> companies, 
>record yourself.

Right, I did not get your point at first. This does indeed cover the first 
reason
to use explicit mark-up in TeX.

Concerning spelling changes, Petr Sojka's `Notes on Compound Word
Hyphenation in TeX' [1], section 3.2, describes how a minimally extended
version of the TeX algorithm can deal with irregular hyphenation without any
extraneous mark-up, i.e., without any unnecessary burden on the author.
Perhaps an idea for Prince7?

Anyway, the preliminary conclusion seems to be that a  element in HTML
is unnecessary, so this discussion should probably continue somewhere else.

[1] http://www.fi.muni.cz/usr/sojka/papers/tug95.pdf

-- 
Øistein E. Andersen


Re: [whatwg] Hyphenation

2007-01-11 Thread Håkon Wium Lie
Also sprach Øistein E. Andersen:

 > (By the way, the term
 > `dictionary' used to designate a set of hyphenation patterns that
 > are not, in general, words, is quite confusing.)

The term "hypenation dictionary" is quite common, but I see your
point. What would be a better name for the property?

  hyphenation-pattern
  hypenation-list
  hypenation-resource

or, perhaps:

  hyphenation­pattern

:-)

 > >> [In TeX], hyphenation can [also] be indicated locally.
 > >> This is needed in order to hyphenate words like
 > >> rec-ord/re-cord and is the only level that deals with
 > >> spelling changes.
 > 
 > > This can be done by supplying your own dictionary through the
 > > 'hyphenate-dictionary' property.
 > 
 > You seem to have misinterpreted the intended meaning of
 > `locally'. The two problems are as follows:
 > 
 > 1) Given the following sentence: `Don't wait for record companies,
 > record records yourselves.' In order to hyphenate
 > this correctly, explicit hyphenation points (\- in TeX) must
 > be inserted locally, i.e., as part of the words, as follows:
 > `Don't wait for rec\-ord companies, re\-cord rec\-ords yourselves.'

­ is probably the best way to encode this. However, it can be done
through CSS as well:

  Dont's wait for record 
  companies, record 
yourself.

-h&kon
  Håkon Wium Lie  CTO °þe®ª
[EMAIL PROTECTED]  http://people.opera.com/howcome



Re: [whatwg] Hyphenation

2007-01-11 Thread Øistein E . Andersen
On 11 Jan 2007, at 1:49PM, Håkon Wium Lie wrote:

> Prince doesn't support exception dictionaries. Is it not
> possible to encode exceptions in the hyphenation dictionary?

Yes, that should be possible, actually. The encoding of certain
words in a default exception dictionary seems to be a design
choice in TeX rather than a requirement. (By the way, the term
`dictionary' used to designate a set of hyphenation patterns that
are not, in general, words, is quite confusing.)

> DSSSL has an 'hyphenation-exceptions' property which takes a
> list of strings. I'm unsure if it has been implemented, though.

Interesting. This would be useful for authors who wanted to
indicate a few exceptions without specifying a complete set of
hyphenation patterns. (TeX includes 4,447 patterns, and two or
several sets cannot easily be merged.)

>> [In TeX], hyphenation can [also] be indicated locally.
>> This is needed in order to hyphenate words like
>> rec-ord/re-cord and is the only level that deals with
>> spelling changes.

> This can be done by supplying your own dictionary through the
> 'hyphenate-dictionary' property.

You seem to have misinterpreted the intended meaning of
`locally'. The two problems are as follows:

1) Given the following sentence: `Don't wait for record companies,
record records yourselves.' In order to hyphenate
this correctly, explicit hyphenation points (\- in TeX) must
be inserted locally, i.e., as part of the words, as follows:
`Don't wait for rec\-ord companies, re\-cord rec\-ords yourselves.'

2) TeX's hyphenation patterns cannot encode spelling changes;
neither can its exception dictionary.
Therefore, spelling changes like backen -> bak-ken must be
indicated explicitly each time the word occurs.

>> There are a few additional caveats. For instance, it is not entirely >> 
>> obvious what should be considered to be a `word' or which characters >> 
>> should be allowed in a `word' 
>> [... lots of less important points ...]
>> How does Prince deal with these issues?

> Prince6 does't try to go beyond Tex.

Fair enough. I realise that my question ended up rather too far away from the 
most important issue. I suppose Prince relies on Unicode character classes to 
identify letters (which is better than Plain TeX's default [unaccented English 
letters only], but less flexible) and uses a special rule to treat hyphens. Is 
this a correct assumption? Can I find more information on such details 
somewhere?

-- 
Øistein E. Andersen


Re: [whatwg] Hyphenation

2007-01-11 Thread Håkon Wium Lie
Also sprach Øistein E. Andersen:

 > > Prince6 (www.princexml.com) supports these properties:
 > > 
 > >   hyphenate: none | auto
 > >   hyphenate-dictionary: none | url(...)
 > >   hyphenate-before: 
 > >   hyphenate-after: 
 > >   hyphenate-lines: none | 
 > 
 > >From http://www.princexml.com/howcome/2006/p6/p6demo2.html:
 > 
 > > Prince can read the hyphenation format pioneered by TeX and reused by many
 > > other applications. OpenOffice hosts a number of hyphenation dictionaries 
 > > that
 > > are reusable in Prince6.

 ...
 
 > This is, however, only one part of TeX's hyphenation system. The next level 
 > is a
 > hyphenation exception dictionary, a list of fully hyphenated words that 
 > would not
 > otherwise be hyphenated correctly. 

Prince doesn't support exception dictionaries. Is it not possible to
encode exceptions in the hyphenation dictionary?

DSSSL has an 'hyphenation-exceptions' property which takes a list of
strings. I'm unsure if it has been implemented, though.

http://dsssl.netfolder.com/paragraph-flow-object.htm

 > In addition to this, hyphenation can be indicated locally. This is needed in 
 > order to
 > hyphenate words like rec-ord/re-cord and is the only level that deals with
 > spelling changes.

This can be done by supplying your own dictionary through the
'hyphenate-dictionary' property.

 > There are a few additional caveats. For instance, it is not entirely obvious 
 > what
 > should be considered to be a `word' or which characters should be allowed in 
 > a
 > `word' (given that only `words' can be hyphenated using this kind of 
 > algorithms).
 > TeX uses `category codes' to define letters, and Unicode's character classes
 > give a good approximation, but they cannot be redefined to deal with specific
 > issues. In Italian, for instance, dell'opera should be hyphenated dell'o-
 > pera, but opera should not be hyphenated o-pera. (The particular example may
 > be wrong, but the principle is correct.) Unless the apostrophe is
 > considered to be a `letter' (a constituent of a `word'), correct patterns do 
 > not
 > help, as `dell'opera' will not be considered as one unit during 
 > hyphenation-point
 > look-up.
 > 
 > Another example worth mentioning is that Polish and a few other languages
 > apparently require a hyphenated word like xxx-yyy to be hyphenated xxx-
 > -yyy (with an extra hyphen carried over). A truly flexible system would allow
 > to specify, e.g., which non-letters to treat as part of words and which to 
 > give
 > special treatment. (As we all know, TeX hyphenates xxx-yyy as xxx-
 > yyy; in addition, the hyphen prohibits xxx and yyy from being hyphenated,
 > which may or may not be suitable depending on, e.g., column width.)
 > 
 > How does Prince deal with these issues?

Prince6 does't try to go beyond Tex.

-h&kon
  Håkon Wium Lie  CTO °þe®ª
[EMAIL PROTECTED]  http://people.opera.com/howcome



Re: [whatwg] Hyphenation

2007-01-11 Thread Øistein E . Andersen
Thanks for all the interesting comments so far.


On 9 Jan 2007, at 10:23AM, Anne van Kesteren wrote:

> "[...] simple cases could also be handled with a `soft hyphen' (­), if 
> browsers
> would only support it." which is of course not an excuse to go around and 
> introduce
> a new element!

Indeed. Browser support has improved since that document was written, though.
Today, all major browsers except Firefox support the soft hyphen, and the 
purpose of a new element would be to enable more complex cases to be handled
properly, not to replace the soft hyphen.


On 9 Jan 2007, at 1:3PM, Leons Petrazickis wrote:

> Hyphenation is a presentational problem. [...] We should
> avoid embedding presentational hyphenation tags in the actual text.

Yes, if possible. The verb record is supposed to be hyphenated re-cord,
whilst the correct hyphenation of the noun is rec-ord. For this reason, TeX
never hyphenates record (unless the author writes rec\-ord or re\-cord).

This problem may be more common in other languages, but expecting authors
to hard-code hyphenation points in particular words is probably futile.

> I would suggest that the first priority is getting a naive hyphenator into 
> browsers.

This would probably have to be language-specific, though. See comments on Prince
below.

> [To handle special cases,] I would suggest a hyphenation dictionary in the
>  of the document.

Not a bad idea. The problem is that the words requiring special attention will
depend on the particular «naïve» algorithm implemented, i.e., the browser...


On 9 Jan 2007, at 1:15PM, Alexey Feldgendler wrote:

> In some typographical traditions, non-full-justified text is sometimes
> hyphenated.

In the mechanical-typewriter era, a typist would certainly choose to hyphenate 
when
the bell sounded in the middle of a long word.


On 9 Jan 2007, at 1:37PM, Håkon Wium Lie wrote:

> Prince6 (www.princexml.com) supports these properties:
> 
>   hyphenate: none | auto
>   hyphenate-dictionary: none | url(...)
>   hyphenate-before: 
>   hyphenate-after: 
>   hyphenate-lines: none | 

>From http://www.princexml.com/howcome/2006/p6/p6demo2.html:

> Prince can read the hyphenation format pioneered by TeX and reused by many
> other applications. OpenOffice hosts a number of hyphenation dictionaries that
> are reusable in Prince6.

This is a great step forward. I hope something along these lines will find its 
way
into desktop browsers as well.

It should be noted, though, that — unless I have misunderstood something —
the `hyphenation dictionaries' are really patterns that allow to compute
hyphenation points. The particular method used in TeX was discovered by
Frank M. Liang about 25 years ago and implemented in TeX soon thereafter.
According to the TeXbook, the original US-English patterns find about 90%
of the hyphenation points given in a dictionary or about 95% of the permissible
hyphenation points in a typical text (where common words are more frequent)
without making any mistakes.

This is, however, only one part of TeX's hyphenation system. The next level is a
hyphenation exception dictionary, a list of fully hyphenated words that would 
not
otherwise be hyphenated correctly. (Plain) TeX contains a list of fourteen words
including `present' (which cannot be hyphenated without knowing whether it is a
noun or a verb, so TeX does not try) end `ta-ble' (a common word that would 
otherwise not be hyphenated at all), and the author can add words at any time
useing the \hyphenation command.

In addition to this, hyphenation can be indicated locally. This is needed in 
order to
hyphenate words like rec-ord/re-cord and is the only level that deals with
spelling changes. If The New Yorker were using TeX and wanted preëmptive to
hyphenate as pre-emptive, this rule could not be incorporated into either the
patterns or the exception dictionary. From an i18n perspective, the patterns and
(at the very least) the exception dictionary ought to allow not only insertion 
of
hyphens, but also spelling changes to be specified. The examples given so far in
this thread may not be convincing, but if it is true that l·l should in general 
hyphenate
as l-l in Catalan, this certainly is an important problem for that language, 
and there
are probably many similar issues in other languages that we just do not know 
about.

It seems that Prince currently uses TeX patterns, but no exception dictionary,
and allows local encoding of hyphenation points (­), but not spelling 
changes.

There are a few additional caveats. For instance, it is not entirely obvious 
what
should be considered to be a `word' or which characters should be allowed in a
`word' (given that only `words' can be hyphenated using this kind of 
algorithms).
TeX uses `category codes' to define letters, and Unicode's character classes
give a good approximation, but they cannot be redefined to deal with specific
issues. In Italian, for instance, dell'opera should be hyphenated dell'o-
pera, but opera should n

Re: [whatwg] Hyphenation [Correction concerning Opera]

2007-01-11 Thread Øistein E . Andersen
On 8 Jan 2007, at 11:2PM, I wrote:

> Opera currently seems not to render [the soft hyphen]
> in accordance with Unicode

The problem I observed (soft hyphens failing to render as visible glyphs at 
line-
breaks) is limited to a particular build of Opera (not the latest one) and 
probably
only affects the Macintosh platform.

With newer builds and on other platforms, Opera handles ­ just as correctly
as Safari and IE do, and I clearly should have checked this before posting.

Sorry for the misinformation.

-- 
Øistein E. Andersen


Re: [whatwg] Hyphenation

2007-01-10 Thread Sander Tekelenburg
At 02:19 +0100 UTC, on 2007-01-11, Håkon Wium Lie wrote:

> Also sprach Sander Tekelenburg:
>
>  > FWIW, my feeling is that it would be best if there'd be a defined format
>for
>  > hyphenation rules, and browsers would accept such description files [...]
>
> This format exists. It was pioneered by TeX

Cool! (I couldn't find a spec though. Could you point to it?)

[...]

> I agree that browsers should read these dictionaries.

OK, so given your position ;) that raises the question of why they don't yet
:) Just a matter of "so many things to do, so little time"? Or does this
require something to be specced first?

> However, the
> dictionaries don't have to ship with browsers -- they can be web
> resources just like style sheets and images are.

I'm not sure they should. I think this is the sort of thing that users should
have easy control over and Web publishers shouldn't be burdened with (they're
unlikely to be hyphenation specialists, after all). So my thought was more
that users could themselves create such files (or, more likely for most
users, download someone else's creation) and install it, to have the browser
apply them to all content in that language. I think that's the only way to
ensure users can get the hyphenation that they consider correct. (Obviously
the browser would have to allow multiple hypenation files, for multiple
languages, to be installed.)

The only reason I suggested that browsers could even ship with them is that
doing so would more quickly reach more users; get them aware of and used to
such a nicety. Not a necessity. More a means of 'evangelisation', if you
will. No doubt the first browser to offer this will generate some buzz ;)


-- 
Sander Tekelenburg
The Web Repair Initiative: 


Re: [whatwg] Hyphenation

2007-01-10 Thread Håkon Wium Lie
Also sprach Sander Tekelenburg:

 > FWIW, my feeling is that it would be best if there'd be a defined format for
 > hyphenation rules, and browsers would accept such description files as a
 > plug-in. This would allow each language's specialist to write their rules,
 > and share them, without putting that burden on browser authors. (Browsers
 > could of course still be shipped with such rulesets.)

This format exists. It was pioneered by TeX and is now widely used by
other applications. Here is the OpenOffice repository:

  http://wiki.services.openoffice.org/wiki/Dictionaries

You can plug these into Prince as per:

  http://www.princexml.com/howcome/2006/p6/p6demo2.html

I agree that browsers should read these dictionaries. However, the
dictionaries don't have to ship with browsers -- they can be web
resources just like style sheets and images are.

-h&kon
  Håkon Wium Lie  CTO °þe®ª
[EMAIL PROTECTED]  http://people.opera.com/howcome



Re: [whatwg] Hyphenation

2007-01-10 Thread Sander Tekelenburg
At 20:22 +0200 UTC, on 2007-01-09, Henri Sivonen wrote:

[...]

>   * Not knowing Dutch, the example makes me guess that the diaeresis
> in Dutch has the same meaning as in French (indicate that vowels
> don't form a diphthong). If this is the case, the interaction of the
> diaeresis with hyphenation may even be a generalizable rule that
> could be hard-coded in Dutch-aware hyphenating browsers. Is it a
> generalizable rule?

I don't think you can generalize it like this, because, like many other
languages, dutch borrows from other languages (notable in this case would be
german). So there are dutch words where the umlaut has a different function
and thus the hypenation rule would be different.

[But note that, although I speak dutch, that doesn't make me a specialist...]

[...]

>   * Not having a language-specific dictionary available in a browser
> doesn't make things worse than the status quo, so it isn't that big a
> deal.

That's assuming status quos aren't bad :) (I wouldn't want to be a language
teacher in this day and age, where, due to computers' restrictions, your
students will constantly see bad examples.)

FWIW, my feeling is that it would be best if there'd be a defined format for
hyphenation rules, and browsers would accept such description files as a
plug-in. This would allow each language's specialist to write their rules,
and share them, without putting that burden on browser authors. (Browsers
could of course still be shipped with such rulesets.)


-- 
Sander Tekelenburg, 


Re: [whatwg] Hyphenation

2007-01-10 Thread James Graham

Kornel Lesinski wrote:

On Tue, 09 Jan 2007 23:47:46 -, James Graham <[EMAIL PROTECTED]> wrote:

FWIW this all makes just as much sense with "dictionary" replaced by 
"stylesheet" (stylesheets need to be kept in sync as new elements, 
classes and ids are used rather than new words).


Not entirely. The layout and structure of the documents is not as 
variable as their content.


But it is much more variable than the hyphenation rules for a particular 
language. Any sensible site would add all the special hyphenations they wanted 
to use to a site-wide dictionary. 99% of the time the dictionary would be pulled 
from cache and all the supposed problems would disappear.


--
"Eternity's a terrible thought. I mean, where's it all going to end?"
 -- Tom Stoppard, Rosencrantz and Guildenstern are Dead


Re: [whatwg] Hyphenation

2007-01-10 Thread Kornel Lesinski

On Tue, 09 Jan 2007 23:47:46 -, James Graham <[EMAIL PROTECTED]> wrote:

FWIW this all makes just as much sense with "dictionary" replaced by  
"stylesheet" (stylesheets need to be kept in sync as new elements,  
classes and ids are used rather than new words).


Not entirely. The layout and structure of the documents is not as variable  
as their content.


And the solution that works for stylesheets (external file) has a problem:  
delays initial display or causes FOUC. With external dictionary there  
could be another FOUC - Flash of Unhyphenated Content.


--
regards, Kornel Lesiński


Re: [whatwg] Hyphenation

2007-01-09 Thread James Graham

Kornel Lesinski wrote:

Hyphenation dictionary supplied by the page seems like a good idea, but 
having it in  might cause some headaches in dynamic systems:

* in some template systems adding anything to  is difficult
* author may want to compose page from several independent fragments, 
possibly each having its own dictionary. Merging these dictionaries 
would either require some extra logic or cause duplicate entires (and 
authors won't like that waste).
* One would have to keep in sync dictionaries and text (in practice 
there will be cases when dictionary lacks some words or contains words 
which aren't present in text any more)
* syntax proposed is verbose and with entire dictionary repeated on 
every page that adds up to a substantial traffic


FWIW this all makes just as much sense with "dictionary" replaced by 
"stylesheet" (stylesheets need to be kept in sync as new elements, 
classes and ids are used rather than new words).



--
"The universe doesn't care what you believe. The wonderful thing about 
science is that it doesn't ask for your faith, it just asks for your 
eyes" --- http://xkcd.com/c154.html


Re: [whatwg] Hyphenation

2007-01-09 Thread Kornel Lesinski
On Tue, 09 Jan 2007 13:03:04 -, Leons Petrazickis  
<[EMAIL PROTECTED]> wrote:



I would suggest that the first priority is getting a naive hyphenator
into browsers. Since you only ever need hyphenation when
full-justifying


I disagree. It's also needed in narrow columns, even if they're  
left-justified and may be useful for very long words in general.



Once that is in place, we can start thinking about special cases. I
would suggest a hyphenation dictionary in the  of the document.


Hyphenation dictionary supplied by the page seems like a good idea, but  
having it in  might cause some headaches in dynamic systems:

* in some template systems adding anything to  is difficult
* author may want to compose page from several independent fragments,  
possibly each having its own dictionary. Merging these dictionaries would  
either require some extra logic or cause duplicate entires (and authors  
won't like that waste).
* One would have to keep in sync dictionaries and text (in practice there  
will be cases when dictionary lacks some words or contains words which  
aren't present in text any more)
* syntax proposed is verbose and with entire dictionary repeated on every  
page that adds up to a substantial traffic


And this problem can't be solved by using external file for dictionary, as  
it will either delay initial display of the page until dictionary is  
loaded or will require reflow of entire page.



Therefore I suggest something possibly a bit more difficult to UAs - learn  
from the text in document.


Having document with:
The zoë-ven hypertext must ab-stain from grooming monkeys in an
indefatigably questionable fashion. The zoëven hypertext must abstain
from grooming monkeys in an indefatigably questionable fashion.

UA would make note of words with soft hyphens and replace further  
non-hyphenated occurances with hyphenated ones:

The zoë-ven hypertext must ab-stain from grooming monkeys in an
indefatigably questionable fashion. The zoë-ven hypertext must ab-stain
from grooming monkeys in an indefatigably questionable fashion.

(decision whether this is visible in DOM or not is probably best left to  
implementation).


As for the exceptions in hyphenation, I'm in favor of  element.


--
pozdrawiam, Kornel Lesiński


Re: [whatwg] Hyphenation

2007-01-09 Thread Henri Sivonen

On Jan 9, 2007, at 01:02, Øistein E. Andersen wrote:

In summary, hyphenation is a hard problem: breaking points cannot  
in general
be established algorithmically; hyphenation dictionaries are not  
always available
and typically do not contain long/rare/complex words (the ones that  
really
need to be hyphenated); furthermore, distinct words may be spelt  
identically,
but still need to be hyphenated differently; and several languages  
require spelling

changes when words are hyphenated ([3] mentions Dutch, German (alte
Rechtschreibung), Spanish, Norwegian, Swedish and Hungarian).


My initial thoughts:

 * Prince seems to be doing exactly the right thing: control overall  
hyphenation with CSS, honor soft hyphens and support TeX-compatible  
language-specific dictionaries.


 * The Swedish and Dutch examples given in this thread seem to be  
addressable with language-specific dictionaries.


 * Not knowing Dutch, the example makes me guess that the diaeresis  
in Dutch has the same meaning as in French (indicate that vowels  
don't form a diphthong). If this is the case, the interaction of the  
diaeresis with hyphenation may even be a generalizable rule that  
could be hard-coded in Dutch-aware hyphenating browsers. Is it a  
generalizable rule?


 * Knowing a bit Swedish, I really have a hard time taking seriously  
the notion of Swedish requiring new markup to be introduced to HTML.  
The sky won't fall if a browser doesn't know how to hyphenate Swedish  
chewing gum in the absence of a hyphenation dictionary. (Besides, it  
looks like the Swedish rule is generalizable so that a hyphenator  
wouldn't even need a list of all possible compound words but a  
dictionary of simple words that can be part of a compound would  
suffice.)


 * Not having a language-specific dictionary available in a browser  
doesn't make things worse than the status quo, so it isn't that big a  
deal.


 * Hand-coders wouldn't bother to type hyphenation data for  
everything every time. (TeX users run the typesetting step themselves  
whereas HTML is rendered elsewhere. TeX users only tend to  
micromanage the words that they see didn't typeset nicely.)


 * It is unlikely that authoring tools would opt to dump their  
hyphenation data in documents even if their data was in a format  
suitable for dumping in whatever format was required.


 * All the languages cited as requiring spelling changes are written  
using the Latin script. The Latin script has a long cultural  
tradition of adapting to writing technology: from chiseled marble to  
quills to movable type to typewriters to computer displays.  
Therefore, I don't find it unreasonable to suggest adapting to the  
limitations of the medium here.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Hyphenation

2007-01-09 Thread Håkon Wium Lie
Also sprach Alexey Feldgendler:

 > > I would suggest that the first priority is getting a naive hyphenator
 > > into browsers. Since you only ever need hyphenation when
 > > full-justifying, I would suggest:
 > >
 > > align: hyphenated;
 > 
 > In some typographical traditions, non-full-justified text is
 > sometimes hyphenated. I believe that hyphenation should be a
 > separate property, orthogonal to text-align. Also, there are some
 > common hyphenation options (like the maximum number of consequtive
 > hyphenated lines allowed) that are also worth CSS properties.

Prince6 (www.princexml.com) supports these properties:

  hyphenate: none | auto
  hyphenate-dictionary: none | url(...)
  hyphenate-before: 
  hyphenate-after: 
  hyphenate-lines: none | 

(with a "prince-" prefix)

You can see the properties in use here:

  http://www.princexml.com/howcome/2006/p6/p6demo2.html

Currently, Prince will only hypenate paragraphs with 'text-align:
justify'. I agree that hypenation is useful in other cases as well.

-h&kon
  Håkon Wium Lie  CTO °þe®ª
[EMAIL PROTECTED]  http://people.opera.com/howcome



Re: [whatwg] Hyphenation

2007-01-09 Thread Alexey Feldgendler
On Tue, 09 Jan 2007 14:03:04 +0100, Leons Petrazickis  
<[EMAIL PROTECTED]>

wrote:


I would suggest that the first priority is getting a naive hyphenator
into browsers. Since you only ever need hyphenation when
full-justifying, I would suggest:

align: hyphenated;


In some typographical traditions, non-full-justified text is sometimes
hyphenated. I believe that hyphenation should be a separate property,  
orthogonal
to text-align. Also, there are some common hyphenation options (like the  
maximum

number of consequtive hyphenated lines allowed) that are also worth CSS
properties.


--
Alexey Feldgendler <[EMAIL PROTECTED]>
[ICQ: 115226275] http://feldgendler.livejournal.com


Re: [whatwg] Hyphenation

2007-01-09 Thread Leons Petrazickis

On 1/8/07, Øistein E.  Andersen <[EMAIL PROTECTED]> wrote:

Currently, hyphenation and justification are scarce on the Web


Is there any browser support for automatic hyphenation?


and the average
blogger hardly misses these features.


Hyphenation is a presentational problem. When you copy hyphenated
text, you want the non-hyphenated version in the clipboard. We should
avoid embedding presentational hyphenation tags in the actual text.

I would suggest that the first priority is getting a naive hyphenator
into browsers. Since you only ever need hyphenation when
full-justifying, I would suggest:

align: hyphenated;

Once that is in place, we can start thinking about special cases. I
would suggest a hyphenation dictionary in the  of the document.



 Monkey grooming

 
 
 
 
 



The zoëven hypertext must abstain from grooming monkeys in an
indefatigably questionable fashion. The zoëven hypertext must abstain
from grooming monkeys in an indefatigably questionable fashion. The
zoëven hypertext must abstain from grooming monkeys in an
indefatigably questionable fashion.




Thus, when people dislike the naive hyphenation of a word, they can
specify one of their liking in the header.

--
Leons Petrazickis


Re: [whatwg] Hyphenation

2007-01-09 Thread Anne van Kesteren
On Tue, 09 Jan 2007 00:02:54 +0100, Øistein E. Andersen  
<[EMAIL PROTECTED]> wrote:
The controversy surrounding the meaning of ­ (U+00AD) is probably  
over, although Opera currently seems not to render this character in  
accordance with Unicode (IE7 and Safari seem to do the right thing;

Firefox does not hyphenate at all).


So as I understand it from  
http://www.w3.org/International/O-HTML-hyphenation.html ­ is not  
enough because you have (theoretical) cases like:


  zoëven -> zo-e-ven

I doubt this is really important to web pages though.

The W3C page also mentions "Of course, the simple cases could also be  
handled with a `soft hyphen' (­), if browsers would only support it."  
which is of course not an excuse to go around and introduce a new element!



--
Anne van Kesteren




Re: [whatwg] Hyphenation

2007-01-09 Thread Mikko Rantalainen

Øistein E. Andersen wrote:

Hyphenation does not seem to have been discussed on this list so far, and I 
think
it should be.

General discussion:
[1] http://www.w3.org/International/O-HTML-hyphenation.html

Old proposal:
[2] http://www.nada.kth.se/i18n/html/hyph.html
[...]
The proposal [2] suggests the addition of a new  element, modelled after
TeX's \discretionary command (with a possibly superfluous addition), that 
permits
to specify which characters to render before/after a line break if the word is 
broken.


I think I like the style suggested by Mirsad Todorovac [1]. Even though 
I'm very familiar with SGML/XML style markup I find this


zoëven

much harder to understand than this

zoëven

I would prefer  instead for the latter case.

--
Mikko