On Tue, Dec 8, 2009 at 6:02 PM, Serdar Tumgoren <zstumgo...@gmail.com> wrote:
> As part of my program, I'm planning to create objects that perform > some initial data clean-up and then parse and database the cleaned > data. Currently I'm expecting to have a FileCleaner and Parser > classes. Using the TDD approach, I've so far come up with the below: > > class FileCleaner(object): > def __init__(self, datastring): > self.source = datastring > > def convertEmDashes(self): > """Convert unicode emdashes to minus signs""" > self.datastring = self.source.replace(u'\u2014','-') > > def splitLines(self): > """Generate and store a list of cleaned, non-empty lines""" > self.data = [x.strip() for x in > self.datastring.strip().split('\n') if x.strip()] > > > My confusion involves the test code for the above class and its > methods. The only way I can get splitLines to pass its unit test is by > first calling the convertEmDashes method, and then splitLines. > > class TestFileCleaner(unittest.TestCase): > def setUp(self): > self.sourcestring = u"""This line has an em\u2014dash.\n > So does this \u2014\n.""" > self.cleaner = FileCleaner(self.sourcestring) > > def test_convertEmDashes(self): > """convertEmDashes should remove minus signs from datastring > attribute""" > teststring = self.sourcestring.replace(u'\u2014','-') > self.cleaner.convertEmDashes() > self.assertEqual(teststring, self.cleaner.datastring) > > def test_splitLines(self): > """splitLines should create a list of cleaned lines""" > teststring = self.sourcestring.replace(u'\u2014','-') > data = [x.strip() for x in teststring.strip().split('\n') if x.strip()] > self.cleaner.convertEmDashes() > self.cleaner.splitLines() > self.assertEqual(data, self.cleaner.data) > > Basically, I'm duplicating the steps from the first test method in the > second test method (and this duplication will accrue as I add more > "cleaning" methods). I see a few problems with this. You are confused about what splitLines() does. It does not create a list of cleaned lines, it just splits the lines. Because of your confusion about splitLines(), your test is not just testing splitLines(), it is testing convertEmDashes() and splitLines(). That is why you have code duplication. test_splitLines() could look like this: def test_splitLines(self): """splitLines should create a list of split lines""" teststring = self.sourcestring data = [x.strip() for x in teststring.strip().split('\n') if x.strip()] self.cleaner.splitLines() self.assertEqual(data, self.cleaner.data) Your tests are not very good. They don't really test anything because they use the same code that you are trying to test. What if str.replace() or split() or strip() does not work the way you expect? Your tests would not discover this. You should just hard-code the expected result strings. I would write test_splitLines() like this: def test_splitLines(self): """splitLines should create a list of split lines""" data = [u"This line has an em\u2014dash.", u"So does this \u2014", u"."] self.cleaner.splitLines() self.assertEqual(data, self.cleaner.data) You probably don't want to hard-code the source string in the setup method. Typically you want to test a function with multiple inputs so you can check its behaviour with typical values, edge cases and invalid input. For example test_splitLines() doesn't verify that splitLines() removes blank lines; that would be a good test case. You might make a list of pairs of (input value, expected result) and pass each one to splitLines(). > So my questions -- Am I misunderstanding how to properly write unit > tests for this case? Or perhaps I've structured my program > incorrectly, and that's what this duplication reveals? I suspected, > for instance, that perhaps I should group these methods > (convertEmDashes, splitLines, etc.) into a single larger function or > method. Yes, your tests are revealing a problem with the structure. You should probably have a single process() method that does all the cleanup methods and the split. Then you could also have a test for this. There is really no need for a class here. You could write separate functions for each cleanup and for the split, then another function that puts them all together. This would be easier to test, too. Kent _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor