On 5/2/11 5:28 PM, Tim Starling wrote:
>How many wikitext parsers does the world really need?

That's a tricky question. What MediaWiki calls parsing, the rest of the 
world calls

1. Parsing
2. Expansion (i.e. templates, magic)
3. Applying local state, preferences, context (i.e. $n, prefs)
4. Emitting

And phases 2 and 3 depend heavily upon the state of the local wiki at 
the time the parse is requested. If you've ever tried to set up a test 
wiki that works like Wikipedia or Wikimedia Commons you'll know what I'm 
talking about.

As for whether the rest of the world needs another wikitext parser: 
well, they keep writing them, so there must be some reason why this 
keeps happening. It's true that language chauvinism plays a part, but 
the inflexibility of the current approach is probably a big factor as 
well. The current system mashes parsing and emitting to HTML together, 
very intimately, and a lot of people would like those to be separate.

   - if they're doing research or stats, and want a more "pure", more 
normalized form than HTML or Wikitext.

   - if they're Google, and they want to get all the city infobox data 
and reuse it (this is a real request we've gotten)

   - if they're OpenStreetMaps, and the same thing;

   - if they're emitting to a different format (PDF, LaTeX, books);

   - if they're emitting to HTML but with different needs (like mobile);

And then there's the stuff which you didn't know you wanted, but which 
becomes easy once you have a more flexible parser.

A couple of months ago I wrote a mini PEG-based wikitext parser in 
JavaScript, that Special:UploadWizard is using, today, live on Commons.

 
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/UploadWizard/resources/mediawiki.language.parser.js?view=markup

 
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/UploadWizard/resources/mediawiki.language.parser.peg?view=markup

While it was a bit of a heavy download (7K compressed) this gave me the 
ability to do pluralizations in the frontend (e.g. "3 out of 5 uploads 
complete") even for difficult languages like Arabic. Great!

But the unexpected benefit was that it also made it a snap to add very 
complicated interface behaviour to our message strings. Actually, right 
now, with this library + the ingenious way that wikitext does i18n, we 
may have one of the best libraries out there for internationalized user 
interfaces. I'm considering splitting it off; it could be useful for any 
project that used translatewiki.

But I don't actually want to use JavaScript for anything but the final 
rendering stages (I'd rather move most of this parser to PHP) so stay tuned.

Anyway, I think it's obviously possible for us to do some RTE, and some 
of this stuff, with the current parser. But I'm optimistic that a new 
parsing strategy will be a huge benefit to our community, and our 
partners, and partners we didn't even know we could have. Imagine doing 
RTE with an implementation in a JS frontend, that is generated from some 
of the same sources that the PHP backend uses.

For what it's worth: whenever I meet with Wikia employees the topic is 
always about what MediaWiki and the WMF can do to make their RTE hacks 
obsolete. That doesn't mean that their RTE isn't the right way forward, 
but the people who wrote it don't seem to be very strong advocates for 
it. But I don't want to put words in their mouth; maybe one of them can 
add more to this thread?

-- 
Neil Kandalgaonkar     <ne...@wikimedia.org>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to