[Bug 35628] #switch or #ifeq: checks should be HTML escaped

bugzilla-daemon Fri, 06 Apr 2012 12:25:44 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=35628


Philippe Verdy <verd...@wanadoo.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |verd...@wanadoo.fr

--- Comment #7 from Philippe Verdy <verd...@wanadoo.fr> 2012-04-06 19:25:39 UTC 
---
a swtich should not make the difference between a character that is represented
by a numeric character reference of natively.

so if a templace is encoded like this:

{{#switch:{{{1|}}}|&#x40;=yes|#default=no}}

or like this:

{{#switch:{{{1|}}}|@=yes|#default=no}}

this should work equally when passing it the parameter 1=@ or 1=&#x40; or
1=&#64;

All numeric character references (plus some wellknown named character
references that are warrantied to be suppoorted everywhere in XML and HTML;
i.e. the 5 standard ones: &amp; &lt; &gt; &quot; &pos;) should be treated
everywhere as counting for 1 Unicode character (excactly like the UTF-8
sequences of bytes represening this character). All valid syntaxes for numeric
character references should be accepted (decimal and hexadecimal), as long as
they designate a valid Unicode code point (in the valid numeric range from
U+0000 to U+10FFFF), and that code point is assigned to a valid character
(excluding codepoints assigned to surrogates, and codepoints assigned to
non-characters like U+FFFE), and that character can be part of a valid HTML
document (so, excluding most C0 and C1 controls, and converting all the few
acceptable controls only as SPACE U+0020 or LINEFEED U+000A after unification
of CR+LF into a single linefeed).

This should be a simple way to escape every character, deprecating the use of
"nowiki", ecept as an esay way that avoids using character references in the
source.

But character references should be usable EVERYWHERE a valid UTF-8 sequence
representing a single character is usable and not absolutely needed by the
syntaxic lexer/parser (so including in the name of parser functions and magic
keywords, meaning that "{{#&#75;f:x|y}}" will be treated equivalently to
"{{#if:x|y}}". This would make the wiki syntax more compatible with various
character encodings, including via imports/exports to external files.

This also means that only a few characters should NOT be representable as
character references, these are:
  { }
only where they are used as separators for the recognized wiki template call
and parameters syntax, and:
  | =
only within template (or parserfunction) parameters in the wiki syntax, and:
  : ; *
only where they are recognized at the begining of lines for lists in the wiki
syntax, and:
  | !
where they are recognized within wiki tables for delimiting cells/rows, and:
  < " ' >
where they are used as separators for the recognized markup syntax of HTML
elements or special elements like "<nowiki ... />", "<includeonly ... />" and
"<gallery ... />".

In this later case, character entities should be usable as the universal way of
escaping the special handling given by the wiki syntax parser.

To make things simple, the lexer used in MEdiaWiki should uniformize all input
characters (whever they are encoded as UTF-8 sequences or as numeric or named
character entities) into a single format, even before staring to parse the
content: only the special characters needed for one step should be treated
specially, and kept in their syntaxic format, all others will be uniformized by
NOT using any of these special characters (if they remain present in the
source, the uniformized format should be the smallest decimal numeric character
reference). This would also avoid the unnecessary complexity caused by
"nowiki". All parser functions should be revisited to make sure they use this
"character uniformizer"...

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 35628] #switch or #ifeq: checks should be HTML escaped

Reply via email to