On 19 November 2012 11:02, Albert-Jan Roskam <fo...@yahoo.com> wrote: > Hi, > > I have a function that converts a date value, expressed as the number of > seconds sinds start of the gregorian calendar, into a human-readable format > (typically an iso-date). So if a record contains x date values, and a data > set contains y records, the number of function calls are x * y. Imagine a > data set with 1M records with dob and enrollment_date in in it: the number > of function calls is huge (well, 2M).
The number of function calls is one factor that affects whether memoisation is worthwhile. The more important questions are: How often do you call the function with the same input values? How expensive is the function compared with the table lookup? Memoisation can, for a cheap function, actually slow it down. Before you do anything like this you need to profile your code and ensure that the function in question is actually a bottleneck. Otherwise you're wasting memory and coding time without getting much of a speed up. > I was reading about memoize decorators the other day and I realized that > this function might benefit from memoizing, or a lookup table. On the other > hand, it might complicate the code too much, so it might be better to Keep > It Simple (KIS). Is the code below a sound approach? I believe that, in > effect, it uses a memoization approach (as it is a slowly growing lookup > table). This function is using memoisation but why aren't you just using the memoisation decorators that you read about? > import datetime > > class Test(object): > > def __init__(self): > self.isoDateLookup = {} > self.lookupCount = 0 > > def spss2strDate(self, gregorianDate, fmt="%Y-%m-%d", > recodeSysmisTo=""): > """ This function converts internal SPSS dates (number of seconds > since midnight, Oct 14, 1582 (the beginning of the Gregorian > calendar)) > to a human-readable format """ > MAXLOOKUP = 10**6 > try: > if not hasattr(self, "gregorianEpoch"): > self.gregorianEpoch = datetime.datetime(1582, 10, 14, 0, 0, > 0) > if fmt == "%Y-%m-%d" and len(self.isoDateLookup) <= MAXLOOKUP: > try: > result = self.isoDateLookup[gregorianDate] > self.lookupCount += 1 > except KeyError: > theDate = self.gregorianEpoch + > datetime.timedelta(seconds=gregorianDate) > result = datetime.datetime.strftime(theDate, fmt) > self.isoDateLookup[gregorianDate] = result > return result > else: > theDate = self.gregorianEpoch + > datetime.timedelta(seconds=gregorianDate) > return datetime.datetime.strftime(theDate, fmt) > except OverflowError: > return recodeSysmisTo > except TypeError: > return recodeSysmisTo > except ValueError: > return recodeSysmisTo The advantage of using a decorator for this are that you don't need to complicate the internal code of the function that is memoised and it is easy to enable and disable memoisation. I would just use a decorator. You don't need to write one yourself. Since Python 3.2 the standard library contains a memoisation decorator that you can use: http://docs.python.org/dev/library/functools.html#functools.lru_cache Presumably the place where you read about them would have listed some example decorators that you can use for memoisation. Here's a quick example that works for hashable inputs: def memo(func): table = {} def wrapper(inputarg): try: return table[inputarg] except KeyError: table[inputarg] = val = func(inputarg) return val return wrapper @memo def square(x): print('Calling square()') return x ** 2 print('2**2 is %i' % square(2)) # cache miss print('3**2 is %i' % square(3)) # cache miss print('2**2 is %i' % square(2)) # cache hit Output: $ python tmp.py Calling square() 2**2 is 4 Calling square() 3**2 is 9 2**2 is 4 Oscar _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor