[Bug 18443] auto-insert of non-breaking whitespace where appropriate
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443 Bawolff changed: What|Removed |Added Status|NEW |RESOLVED Resolution||DUPLICATE --- Comment #12 from Bawolff 2012-07-29 17:53:06 UTC --- yes. Since that bug is older, lets continue the discussion over there. *** This bug has been marked as a duplicate of bug 13619 *** -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 18443] auto-insert of non-breaking whitespace where appropriate
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443 --- Comment #11 from Nemo_bis 2012-07-27 21:12:49 UTC --- Is bug 13619 a duplicate? -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 18443] auto-insert of non-breaking whitespace where appropriate
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443 --- Comment #10 from seth 2012-07-27 19:52:08 UTC --- (In reply to comment #9) > Hence we'd want to make > the rules have effectively no false positives. I fully agree with that. (And actually that was one of the reasons, why I asked for a management system where admins can quickly change regexps. Because it's quite easy to overlook such false positive cases a priori.) However, cases like "123 %" have schown, that we don't have to fear false positives too much. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 18443] auto-insert of non-breaking whitespace where appropriate
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443 Bawolff changed: What|Removed |Added Keywords||i18n CC||niklas.laxst...@gmail.com --- Comment #9 from Bawolff 2012-07-27 19:16:41 UTC --- I imagine we'd want to change these rules so they're handled in the i18n files instead of in the parser itself (Since we'd want vary per lang). CC'ing Niklas to see if he has any thoughts on the i18n aspects. >The typographic rules[1] in Germany are quite complicated: One of the scary things about this type of scheme is that its invisible to the user. If there are exceptions to the rules, the user cannot override these exceptions (Well maybe they could do things like insert , but its not obvious to the user how to/very difficult for them). Hence we'd want to make the rules have effectively no false positives. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 18443] auto-insert of non-breaking whitespace where appropriate
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443 --- Comment #8 from seth 2012-07-27 19:11:22 UTC --- In reply to comment #5) Yes, a hardcoded solution would be ok. But at least in the beginning there should be an easy way of communication (between admins and devs) regarding changes of that hardcoded rules. The typographic rules[1] in Germany are quite complicated: there should be a _narrow_ _non-breaking_ space inside of * abbreviations (like 'z. B.', 'i. d. R.', 'u. a.') * abbreviations with numbers (like '§ 315', 'Abs. 3', 'S. 78 ff') * dates like '1. Mai' * between numbers and units (like '100 m', '5 kg') If I'd get an "ok" here, s.t. some dev would insert those hardcoded rules for w:de (and probably for all other de-projects, too), then I could create some regexps. [1] actually "rule" is not the right word here. "typographic sugar" would be a better description. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 18443] auto-insert of non-breaking whitespace where appropriate
https://bugzilla.wikimedia.org/show_bug.cgi?id=18443 Bawolff changed: What|Removed |Added Summary|auto-insert of whitespace |auto-insert of non-breaking ||whitespace where ||appropriate --- Comment #7 from Bawolff 2012-07-27 17:35:13 UTC --- (In reply to comment #6) > (In reply to comment #5) > > (Note MW does have some rules for adding nbsp in certain contexts. The rules > > just aren't all that complex) > > What are they, by the way? I think this is not documented anywhere, but it > would important to keep it consistent if we add such a new rule. > Right now I can remember only the separators for digits, used by formatnum, > which is defined in the MessagesXx files and can be modified only there. > > Moreover, some such rules are defined by the [[International System of Units]] > itself IIRC, and are not that easy to find, but may be included in some > library > already? The reporter/voters should probably do some investigation. They're run towards the end of the parsing process (The original proposal in comment 0 that's linked actually refer to them). Specificly they are: 373 # Clean up special characters, only run once, next-to-last before doBlockLevels 374 $fixtags = array( 375 # french spaces, last one Guillemet-left 376 # only if there is something before the space 377 '/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1 ', 378 # french spaces, Guillemet-right 379 '/(\\302\\253) /' => '\\1 ', 380 '/ (!\s*important)/' => ' \\1', # Beware of CSS magic word !important, bug #11874. 381 ); 382 $text = preg_replace( array_keys( $fixtags ), array_values( $fixtags ), $text ); In english they say: *If you have a character (any character including spaces), followed by a space, followed by any of the following characters: ?,:,;,!,% or » (U+BB), the space gets replaced with a non-breaking space. *If you have a « (U+AB) followed by a space, that space is replaced by a non-breaking space. *As an exception to these rules, if you have a non-breaking space followed by "!important", the non-breaking space is turned back into a normally breaking space. This is to prevent messing up CSS style attributes. (This isn't perfect, there's an open bug somewhere about css styles being messed up by this in edge cases). Based on the Guillemet characters, I imagine this is meant for the typing rules of french. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l