[Bug 18443] auto-insert of non-breaking whitespace where appropriate

2012-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443

Bawolff  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||DUPLICATE

--- Comment #12 from Bawolff  2012-07-29 17:53:06 UTC ---
yes. Since that bug is older, lets continue the discussion over there.

*** This bug has been marked as a duplicate of bug 13619 ***

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 18443] auto-insert of non-breaking whitespace where appropriate

2012-07-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443

--- Comment #11 from Nemo_bis  2012-07-27 21:12:49 UTC 
---
Is bug 13619 a duplicate?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 18443] auto-insert of non-breaking whitespace where appropriate

2012-07-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443

--- Comment #10 from seth  2012-07-27 
19:52:08 UTC ---
(In reply to comment #9)
> Hence we'd want to make
> the rules have effectively no false positives.

I fully agree with that.
(And actually that was one of the reasons, why I asked for a management system
where admins can quickly change regexps. Because it's quite easy to overlook
such false positive cases a priori.)

However, cases like "123 %" have schown, that we don't have to fear false
positives too much.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 18443] auto-insert of non-breaking whitespace where appropriate

2012-07-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443

Bawolff  changed:

   What|Removed |Added

   Keywords||i18n
 CC||niklas.laxst...@gmail.com

--- Comment #9 from Bawolff  2012-07-27 19:16:41 UTC ---
I imagine we'd want to change these rules so they're handled in the i18n files
instead of in the parser itself (Since we'd want vary per lang). CC'ing Niklas
to see if he has any thoughts on the i18n aspects.

>The typographic rules[1] in Germany are quite complicated:

One of the scary things about this type of scheme is that its invisible to the
user. If there are exceptions to the rules, the user cannot override these
exceptions (Well maybe they could do things like insert  , but its not
obvious to the user how to/very difficult for them). Hence we'd want to make
the rules have effectively no false positives.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 18443] auto-insert of non-breaking whitespace where appropriate

2012-07-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443

--- Comment #8 from seth  2012-07-27 
19:11:22 UTC ---
In reply to comment #5)

Yes, a hardcoded solution would be ok. But at least in the beginning there
should be an easy way of communication (between admins and devs) regarding
changes of that hardcoded rules.

The typographic rules[1] in Germany are quite complicated:
there should be a _narrow_ _non-breaking_ space inside of
* abbreviations (like 'z. B.', 'i. d. R.', 'u. a.')
* abbreviations with numbers (like '§ 315', 'Abs. 3', 'S. 78 ff')
* dates like '1. Mai'
* between numbers and units (like '100 m', '5 kg')

If I'd get an "ok" here, s.t. some dev would insert those hardcoded rules for
w:de (and probably for all other de-projects, too), then I could create some
regexps.

[1] actually "rule" is not the right word here. "typographic sugar" would be a
better description.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 18443] auto-insert of non-breaking whitespace where appropriate

2012-07-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443

Bawolff  changed:

   What|Removed |Added

Summary|auto-insert of whitespace   |auto-insert of non-breaking
   ||whitespace where
   ||appropriate

--- Comment #7 from Bawolff  2012-07-27 17:35:13 UTC ---
(In reply to comment #6)
> (In reply to comment #5)
> > (Note MW does have some rules for adding nbsp in certain contexts. The rules
> > just aren't all that complex)
> 
> What are they, by the way? I think this is not documented anywhere, but it
> would important to keep it consistent if we add such a new rule.
> Right now I can remember only the separators for digits, used by formatnum,
> which is defined in the MessagesXx files and can be modified only there.
> 
> Moreover, some such rules are defined by the [[International System of Units]]
> itself IIRC, and are not that easy to find, but may be included in some 
> library
> already? The reporter/voters should probably do some investigation.

They're run towards the end of the parsing process (The original proposal in
comment 0 that's linked actually refer to them).

Specificly they are:

 373 # Clean up special characters, only run once, next-to-last
before doBlockLevels
 374 $fixtags = array(
 375 # french spaces, last one Guillemet-left
 376 # only if there is something before the space
 377 '/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1 ',
 378 # french spaces, Guillemet-right
 379 '/(\\302\\253) /' => '\\1 ',
 380 '/ (!\s*important)/' => ' \\1', # Beware of
CSS magic word !important, bug #11874.
 381 );
 382 $text = preg_replace( array_keys( $fixtags ),
array_values( $fixtags ), $text );

In english they say:

*If you have a character (any character including spaces), followed by a space,
followed by any of the following characters: ?,:,;,!,% or » (U+BB), the space
gets replaced with a non-breaking space.
*If you have a « (U+AB) followed by a space, that space is replaced by a
non-breaking space.
*As an exception to these rules, if you have a non-breaking space followed by
"!important", the non-breaking space is turned back into a normally breaking
space. This is to prevent messing up CSS style attributes. (This isn't perfect,
there's an open bug somewhere about css styles being messed up by this in edge
cases).

Based on the Guillemet characters, I imagine this is meant for the typing rules
of french.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l