Re: [Tutor] duplication in unit tests

spir Wed, 09 Dec 2009 03:28:33 -0800

Serdar Tumgoren <zstumgo...@gmail.com> dixit:

> Hi everyone,
> I'm trying to apply some lessons from the recent list discussions on
> unit testing and Test-Driven Development, but I seem to have hit a
> sticking point.
> 
> As part of my program, I'm planning to create objects that perform
> some initial data clean-up and then parse and database the cleaned
> data. Currently I'm expecting to have a FileCleaner and Parser
> classes. Using the TDD approach, I've so far come up with the below:
> 
> class FileCleaner(object):
>     def __init__(self, datastring):
>         self.source = datastring
> 
>     def convertEmDashes(self):
>         """Convert unicode emdashes to minus signs"""
>         self.datastring = self.source.replace(u'\u2014','-')
> 
>     def splitLines(self):
>         """Generate and store a list of cleaned, non-empty lines"""
>         self.data = [x.strip() for x in
> self.datastring.strip().split('\n') if x.strip()]
> 
> 
> My confusion involves the test code for the above class and its
> methods. The only way I can get splitLines to pass its unit test is by
> first calling the convertEmDashes method, and then splitLines.
> 
> class TestFileCleaner(unittest.TestCase):
>     def setUp(self):
>         self.sourcestring = u"""This    line   has an em\u2014dash.\n
>                 So   does this  \u2014\n."""
>         self.cleaner = FileCleaner(self.sourcestring)
> 
>     def test_convertEmDashes(self):
>         """convertEmDashes should remove minus signs from datastring
> attribute"""
>         teststring = self.sourcestring.replace(u'\u2014','-')
>         self.cleaner.convertEmDashes()
>         self.assertEqual(teststring, self.cleaner.datastring)
> 
>     def test_splitLines(self):
>         """splitLines should create a list of cleaned lines"""
>         teststring = self.sourcestring.replace(u'\u2014','-')
>         data = [x.strip() for x in teststring.strip().split('\n') if 
> x.strip()]
>         self.cleaner.convertEmDashes()
>         self.cleaner.splitLines()
>         self.assertEqual(data, self.cleaner.data)
> 
> Basically, I'm duplicating the steps from the first test method in the
> second test method (and this duplication will accrue as I add more
> "cleaning" methods).
> 
> I understand that TestCase's setUp method is called before each test
> is run (and therefore the FileCleaner object is created anew), but
> this coupling of a test to other methods of the class under test seems
> to violate the principle of testing methods in isolation.
> 
> So my questions -- Am I misunderstanding how to properly write unit
> tests for this case? Or perhaps I've structured my program
> incorrectly, and that's what this duplication reveals? I suspected,
> for instance, that perhaps I should group these methods
> (convertEmDashes, splitLines, etc.) into a single larger function or
> method.
> 
> But that approach seems to violate the "best practice" of writing
> small methods. As you can tell, I'm a bit at sea on this.  Your
> guidance is greatly appreciated!!
> 
> Regards,
> Serdar
> 
> ps - recommendations on cleaning up and restructuring code are also welcome!


Hello,

I guess you're first confused at the design level of your app. Test and design 
both require you to clearly express your expectations. Here, the cleanup phase 
may be written as follow (I don't mean it's particuliarly good, just an 
example):

plain source data = input   -->   output = ready-to-process data

As you see, this requirement is, conceptually speaking, a purely function-al 
one; in the plain sense of the word "function". At least, this is the way I see 
it.
Building an object to implement it is imo a wrong interpretation of OO design. 
(It's also writing java in python ;-) I would rather chose to write it as a 
method of a higher-level object. Possibly, this method would split into smaller 
ones if needed.

Then, expressing your tests is in a sense translating the requirement above 
into code: feeding the piece of code to be tested with raw input data and 
checking the output is as expected. As well expressed by Kent, you should test 
with typical, edge, *and wrong* input; in the latter case the test is expected 
to fail.
You will have to hand-write or automatically produce input strings for each 
test. If the func is split, then you will have to do it for each mini-func to 
be tested. This can be rather unpleasant, especially in cases like yours where 
funcs look like logically operating in sequence, but there is no way to escape. 
Actually, the several cleanup tasks (translating special chars, skipping blank 
lines, etc...) are rather orthogonal: they don't need to be tested in sequence.


Denis
________________________________

la vita e estrany

http://spir.wikidot.com/
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] duplication in unit tests

Reply via email to