On Thu, Feb 21, 2008 at 01:12:34PM +1100, Steve Bennett wrote:
> On 2/21/08, Jay R. Ashworth <[EMAIL PROTECTED]> wrote:
> > On Thu, Feb 21, 2008 at 01:16:22AM +1100, Steve Bennett wrote:
> >  > Time to take this grammar and do something with it.
> >
> >
> > Build a parser with it, run it against the corpus, and see how often
> >  each individual rule pukes?
> 
> Ok. I've actually done a bit of that, but I guess I should ramp up the
> scale. It can be hard to detect pukage without actually generating
> XHTML and comparing it, though.
> 
> Generally, though, the answer is "not often". Flip through some random
> wikitext. You'll find that a very small number of rules amount for the
> vast majority of actual use. Though that may change once I have to
> contend with the body of templates. People don't use tables much. They
> don't use HTML tags or entities much. They almost never use magic
> links (especially PMID - wtf is that about it). They almost never use
> horizontal rules, HTML comments and rarely even extensions like <ref>

I don't know if you remember it at this point, Steve, but one of the
reasons I threw "won't someone *please* build us a grammar-driven
parser" up in the air (and thanks, BTW :-), was precisely to get a
fairly reliable count of how often each possible bit'o'grammer appears
in, say, en.wp, so as to get a feeling for what will break if the
syntax is restricted slightly...

That is to say that I concur with your instinct: 90/10 rule, I would
guess, here.

Cheers,
-- jra
-- 
Jay R. Ashworth                   Baylink                      [EMAIL PROTECTED]
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com                     '87 e24
St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274

             Those who cast the vote decide nothing.
             Those who count the vote decide everything.
               -- (Joseph Stalin)


_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to