2009/8/12 dan nessett <dness...@yahoo.com>: > I am investigating how to write a comprehensive parser regression test. What > I mean by this is something you wouldn't normally run frequently, but rather > something that we could use to get past the "known to fail" tests now > disabled. The problem is no one understands the parser well enough to have > confidence that if you fix one of these tests that you will not break > something else. > > So, I thought, how about using the guts of DumpHTML to create a comprehensive > parser regression test. The idea is to have two versions of phase3 + > extensions, one without the change you make to the parser to fix a > known-to-fail test (call this Base) and one with the change (call this > Current). Modify DumpHTML to first visit a page through Base, saving the HTML > then visit the same page through Current and compare the two results. Do this > for every page in the database. If there are no differences, the change in > Current works. > > Sitting here I can see the eyeballs of various developers bulging from their > faces. "What?" they say. "If you ran this test on, for example, Wikipedia, it > could take days to complete." Well, that is one of the things I want to find > out. The key to making this test useful is getting the code in the loop > (rendering the page twice and testing the results for equality) very > efficient. I may not have the skills to do this, but I can at least develop > an upper bound on the time it would take to run such a test. > I read this paragraph first, then read the paragraph above and couldn't help saying "WHAT?!?". Using a huge set of pages is a poor replacement for decent tests. Also, how would you handle intentional changes to the parser output, especially when they're non-trivial?
Roan Kattouw (Catrope) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l