Trevor & I talked with him extensively about this. BTW, around here, 
he's just Ward. :)

He too was disappointed that his team wrote rules to directly transform 
wikitext into HTML.

The parse-everything-in-Wikipedia thing isn't quite what it sounds like. 
If I recall correctly it works like this:

As part of his job at About.us, he was really looking for patterns of 
Wikitext that he could use to snag business information. One target was 
the Infobox on Wikipedia. So, the tool was a way of cataloging the 
various ways that people structure an Infobox template.

Because he wrote this in C, he added rules to the grammar to discard 
information in favor of keeping a data structure of constant size. 
That's mostly what the the <<< >>> in the grammar mean. Anyway, this 
then serves as a sampling of the majority of the structures one is 
interested in. The more rules you write, the more "unknown" stuff falls 
into the fixed size of structures that are unparsed. IIRC he agreed it 
might not be so useful if you were writing a grammar for PHP or JS (I 
assume the same is true for Python).



On 7/11/11 5:24 PM, Erik Rose wrote:
> On Jul 11, 2011, at 5:17 PM, Brion Vibber wrote:
>> We are however producing a different sort of intermediate structure rather 
>> than going straight to HTML output, so things won't be an exact match 
>> (especially where we do template stuff).
>
> Nor are we going straight to HTML, which is one reason we didn't steal this 
> stuff. :-)
> _______________________________________________
> Wikitext-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l

-- 
Neil Kandalgaonkar  |) <[email protected]>

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to