On 5/2/11 5:28 PM, Tim Starling wrote: >How many wikitext parsers does the world really need?
That's a tricky question. What MediaWiki calls parsing, the rest of the world calls 1. Parsing 2. Expansion (i.e. templates, magic) 3. Applying local state, preferences, context (i.e. $n, prefs) 4. Emitting And phases 2 and 3 depend heavily upon the state of the local wiki at the time the parse is requested. If you've ever tried to set up a test wiki that works like Wikipedia or Wikimedia Commons you'll know what I'm talking about. As for whether the rest of the world needs another wikitext parser: well, they keep writing them, so there must be some reason why this keeps happening. It's true that language chauvinism plays a part, but the inflexibility of the current approach is probably a big factor as well. The current system mashes parsing and emitting to HTML together, very intimately, and a lot of people would like those to be separate. - if they're doing research or stats, and want a more "pure", more normalized form than HTML or Wikitext. - if they're Google, and they want to get all the city infobox data and reuse it (this is a real request we've gotten) - if they're OpenStreetMaps, and the same thing; - if they're emitting to a different format (PDF, LaTeX, books); - if they're emitting to HTML but with different needs (like mobile); And then there's the stuff which you didn't know you wanted, but which becomes easy once you have a more flexible parser. A couple of months ago I wrote a mini PEG-based wikitext parser in JavaScript, that Special:UploadWizard is using, today, live on Commons. http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/UploadWizard/resources/mediawiki.language.parser.js?view=markup http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/UploadWizard/resources/mediawiki.language.parser.peg?view=markup While it was a bit of a heavy download (7K compressed) this gave me the ability to do pluralizations in the frontend (e.g. "3 out of 5 uploads complete") even for difficult languages like Arabic. Great! But the unexpected benefit was that it also made it a snap to add very complicated interface behaviour to our message strings. Actually, right now, with this library + the ingenious way that wikitext does i18n, we may have one of the best libraries out there for internationalized user interfaces. I'm considering splitting it off; it could be useful for any project that used translatewiki. But I don't actually want to use JavaScript for anything but the final rendering stages (I'd rather move most of this parser to PHP) so stay tuned. Anyway, I think it's obviously possible for us to do some RTE, and some of this stuff, with the current parser. But I'm optimistic that a new parsing strategy will be a huge benefit to our community, and our partners, and partners we didn't even know we could have. Imagine doing RTE with an implementation in a JS frontend, that is generated from some of the same sources that the PHP backend uses. For what it's worth: whenever I meet with Wikia employees the topic is always about what MediaWiki and the WMF can do to make their RTE hacks obsolete. That doesn't mean that their RTE isn't the right way forward, but the people who wrote it don't seem to be very strong advocates for it. But I don't want to put words in their mouth; maybe one of them can add more to this thread? -- Neil Kandalgaonkar <ne...@wikimedia.org> _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l