2009/8/12 dan nessett <dness...@yahoo.com>:
> I am investigating how to write a comprehensive parser regression test. What 
> I mean by this is something you wouldn't normally run frequently, but rather 
> something that we could use to get past the "known to fail" tests now 
> disabled. The problem is no one understands the parser well enough to have 
> confidence that if you fix one of these tests that you will not break 
> something else.
>
> So, I thought, how about using the guts of DumpHTML to create a comprehensive 
> parser regression test. The idea is to have two versions of phase3 + 
> extensions, one without the change you make to the parser to fix a 
> known-to-fail test (call this Base) and one with the change (call this 
> Current). Modify DumpHTML to first visit a page through Base, saving the HTML 
> then visit the same page through Current and compare the two results. Do this 
> for every page in the database. If there are no differences, the change in 
> Current works.
>
> Sitting here I can see the eyeballs of various developers bulging from their 
> faces. "What?" they say. "If you ran this test on, for example, Wikipedia, it 
> could take days to complete." Well, that is one of the things I want to find 
> out. The key to making this test useful is getting the code in the loop 
> (rendering the page twice and testing the results for equality) very 
> efficient. I may not have the skills to do this, but I can at least develop 
> an upper bound on the time it would take to run such a test.
>
I read this paragraph first, then read the paragraph above and
couldn't help saying "WHAT?!?". Using a huge set of pages is a poor
replacement for decent tests. Also, how would you handle intentional
changes to the parser output, especially when they're non-trivial?

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to