https://bugzilla.wikimedia.org/show_bug.cgi?id=22555

--- Comment #16 from Philippe Verdy <verd...@wanadoo.fr> 2012-03-26 16:06:51 
UTC ---
In summary:

* some characters may be represented in the input as numeric character
references or known named character references:
** Those references may be converted early to UTF-8, EXCEPT IF they represent a
character that may have a contextual meaning in the wiki or HTML syntax; all of
them are ASCII punctuations, i.e. those cahracters in { <  > & ' " = * # : ; -
| }
** Those exceptional characters should remain encoded as a character reference
if they are in such format, but these references can safely be unified using a
decimal character reference, because those characters treated speacilly in the
wiki syntax are all part of ASCII.

* you must reject from the 1st pass (before all processings of the wiki source)
any DEL character present in the input.

* you never need any UNIQ identifier sequence when converting <nowiki>
sections.
** you can just generate a single DEL character at start and at end.
** all parser functions can safely pass over those DEL characters, and can
safely discard them before processing, but only if the parameter itself if not
used within the output of the parser function
** otherwise, if the parameter is used partly or fully in the output, the
output must make sure that there will be an **even** number of DEL characters
in the final ouput
** you can safely replace all sequences of 3 or more DEL's by dropping them in
pairs, e.g. replace 3 DELs by 1 DEL, 4 DELs by 2 DELs, 5 DELs by 1 DEL
** if at end of the generation of the ouput of the parser function, there
remains only an odd number of DELs, then append an additional DEL to the ouput.

* continue processing other parserfunctions.
* continue by generating recognizing the special wiki syntax or recognizing
HTML <elements> or special <elements>
* when you've processed completely the wiki syntax, and just before generating
the HTML, you can strip out all the remain DELs that were just there to prevent
the wiki parser to work.
* And the same time you drop the remaining DEL, you may replace the remaining
numerical character references remaining for ASCII punctuations, and that play
no role in HTML i.e. the caracters in { ; : * #   | - } but not those in < '  &
" > which play a special role in HTML, either within a text element, or within
an attribute value.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to