Jared Williams wrote:
> The problem is the ambiguity with italics, (''italics''). So the
> current parser doesn't really make its final decision on
> what should be bold or what should be italic until it hits a
> newline.  If there are an even number of both bold and italics
> then it assumes it interpreted the line correctly.
[SNIP]
> I think this is part of what makes wikitext undescribable
> in a formal grammar.

And he also wrote:
> Problem is quotes are also valid as part of the textual content, so
> could not italics immediately before or after an apostrophe. As in
>
> L'''arc de triomphe''
>
> Which the current parser resolves to L'<i>arc de triomphe</i>

There lies one of the main problems with parsing wikitext - that it uses a 
wide range of standard text characters to implement it's markup.  In HTML, 
there are basically two (< and >) plus an escape character (&).  Therefore 
HTML can in theory[1] consist of "Any text you like, with <, > and & 
replaced by &lt; &gt; and &amp; respectively" with two special markup 
symbols (<markup goes here> and &escaped_entity;).  No room for ambiguity 
there, and only minimal translation required to convert plain-text to a 
format suitable for use in an HTML document.

In MediaWiki, just taking that single ' character as an example, it could be 
one of several punctuation symbols (apostrophe, single-quote, prime, etc.) 
or it could be part of an opening italic tag, a closing italic tag, an 
opening bold tag, or a closing bold tag.  As far as I understand, it is 
impossible to deal effectively with this massive overloading of the 
apostrophe character without the kind of special logic we have in place 
already (as described by Jared).  To take his example one step further, 
here's something to really throw a formal grammar-based parser, but which 
our parser handles just fine: '''Photo of L'''arc de triomphe'' by 'John''''

- Mark Clements (HappyDog)

[1] I'm ignoring all the document-structure requirements, plus 
character-encoding issues, etc. that complicate things a bit. 



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to