https://bugzilla.wikimedia.org/show_bug.cgi?id=22555
--- Comment #16 from Philippe Verdy <verd...@wanadoo.fr> 2012-03-26 16:06:51 UTC --- In summary: * some characters may be represented in the input as numeric character references or known named character references: ** Those references may be converted early to UTF-8, EXCEPT IF they represent a character that may have a contextual meaning in the wiki or HTML syntax; all of them are ASCII punctuations, i.e. those cahracters in { < > & ' " = * # : ; - | } ** Those exceptional characters should remain encoded as a character reference if they are in such format, but these references can safely be unified using a decimal character reference, because those characters treated speacilly in the wiki syntax are all part of ASCII. * you must reject from the 1st pass (before all processings of the wiki source) any DEL character present in the input. * you never need any UNIQ identifier sequence when converting <nowiki> sections. ** you can just generate a single DEL character at start and at end. ** all parser functions can safely pass over those DEL characters, and can safely discard them before processing, but only if the parameter itself if not used within the output of the parser function ** otherwise, if the parameter is used partly or fully in the output, the output must make sure that there will be an **even** number of DEL characters in the final ouput ** you can safely replace all sequences of 3 or more DEL's by dropping them in pairs, e.g. replace 3 DELs by 1 DEL, 4 DELs by 2 DELs, 5 DELs by 1 DEL ** if at end of the generation of the ouput of the parser function, there remains only an odd number of DELs, then append an additional DEL to the ouput. * continue processing other parserfunctions. * continue by generating recognizing the special wiki syntax or recognizing HTML <elements> or special <elements> * when you've processed completely the wiki syntax, and just before generating the HTML, you can strip out all the remain DELs that were just there to prevent the wiki parser to work. * And the same time you drop the remaining DEL, you may replace the remaining numerical character references remaining for ASCII punctuations, and that play no role in HTML i.e. the caracters in { ; : * # | - } but not those in < ' & " > which play a special role in HTML, either within a text element, or within an attribute value. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l