so, q1 is now always faster than q2 ... seems that the bits you were missing are in fact the ones that auto-set the cache-key based on the query string....
cache.ram is always faster than anything else, cause it stores a pointer to the result, doesn't have to pickle/unpickle the result and can't have (by design) any network latency. The problem is that it's a locking cache and single-process, so YMMV . On Monday, March 25, 2013 12:21:31 PM UTC+1, Paolo valleri wrote: > > Thanks for the very clear explanation. > I made the following: > t0 = time.time() > rows = cache.memcache ('q1', lambda : q1(query)) > t1 = time.time() > rows2 = cache.memcache('q2', lambda: q2(query)) > rows2 = [r for r in rows2 if (r.end_point.gathered_on - > r.start_point.gathered_on < datetime.timedelta(days=1)) ] > t2 = time.time() > print 'q1:', t1-t0 > print 'q2:', t2-t1 > and I got from three different requests: > q1: 0.0104899406433 > q2: 0.0106620788574 > q1: 0.011244058609 > q2: 0.0156059265137 > q1: 0.010370016098 > q2: 0.0133299827576 > Beside memcache, I tried with cache.ram getting: > q1: 9.77516174316e-06 > q2: 0.00217413902283 > q1: 1.62124633789e-05 > q2: 0.00206780433655 > q1: 8.10623168945e-06 > q2: 0.00120091438293 > Given that, in a local env cache.ram is faster than memcache. > > Paolo > > On Monday, March 25, 2013 11:54:07 AM UTC+1, Niphlod wrote: >> >> well, when you plan to use cache you do it because: >> a) the additional roundtrip is negligible >> b) using the cache means your database is not used >> I'll explain it further.... >> A query that takes 0.18ms to get "computed" on the database side can very >> well end up in 18sec as soon as your database gets "hit" by several >> requests. >> A cached query that takes 0.18ms to get stored and 0.18ms to get >> retrieved (unpacked, decompressed, unpickled, choose your own "term" that >> suits the job) will take, if the cache backend is "proper", 0.36ms the >> first time and 0.18ms from the 2nd to the nth. >> Even so, some complicated queries (or uncomplicated for you, but >> complicated for the db backend) can be speedied up just requesting less >> fields or computing in raw python something that can be usually done on the >> backend. >> This kind of optimization (i.e. relying on python's datetime operations >> vs an extract(epoch from datefield) on the db) will be less and less useful >> as soon as the resultsets cardinality grows (i.e. the more rows you have >> the better chance for the db to get the results faster than having a basic >> query and processing the results in python). >> Assuming your "python filter" cuts out the 5% on the total of the rows, >> as soon as the number of rows grows you'll just waste time unpacking that >> additional 5%, and the time difference will be more explicit. >> On a total unrelated note, remember that Python is fast, but dbs are fast >> too, especially when leveraged by all their features (e.g. you could set an >> index on the "computed" epoch and that query will probably gain a lot of >> speedup). >> This is meant to explain why Q2 is faster than Q1. >> On the reason why Q1 with cache isn't faster than Q2 with cache, since >> the time difference is negligible, it's possible that the time difference >> gets "spent" on calculating the automatic key for the cache of the query >> (on the first implementation needs to add the "elapsed time" part) . Just >> to be sure, try timing >> >> def Q1(): >> your first implementation, without cache >> def Q2(): >> your second implementation, without cache, without the check inpython >> >> with >> rows1 = cache('q1', lambda : Q1()) >> >> >> ..... and, separately >> rows2 = cache('q2', lambda: Q2()) >> rows2 = [r for row in rows ...] >> >> >> >> >> On Monday, March 25, 2013 11:12:50 AM UTC+1, Paolo valleri wrote: >>> >>> Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just >>> sharing with you all these results. >>> In both the cases I have the time constrain, in one case it is in the >>> query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1 >>> is no longer checking the constrain while Q2 is still checking all the >>> constrain for all rows. Moreover, computing the .epoch() should take a well >>> defined amount of time but for queries without cache, with cache it should >>> be stored the computed result. >>> >>> To sum up, when cache is enabled, with the same amount of rows (nearly >>> 300): >>> - Q2 is taking less time than Q1 even if it has to >>> compute explicitly the time constrain >>> - Q1 is taking more time than Q2, but it has to retrieve from cache >>> only one field more than Q1 among a dozen of fields >>> >>> Given that, retrieving more data from memcached can be slower than >>> re-computing it for each request. >>> >>> >>> Paolo >>> >>> >>> 2013/3/25 paolo....@gmail.com <paolo....@gmail.com> >>> >>>> >>>> >>>> Paolo >>>> >>>> >>>> 2013/3/25 Niphlod <nip...@gmail.com> >>>> >>>>> I didn't get the point. .... >>>>> Q1 is slower (obviously) cause of the epoch() function and the >>>>> additional filtering >>>>> Q1 with cache shows more or less the same duration as Q2 with cache. >>>>> the key point is "even when the cache is enabled the first query is >>>>> taking longer" ... longer than what ? >>>>> Q1 with cache is speedier that: >>>>> - Q1 without cache, and that's ok >>>>> - Q2 without cache, and that's ok >>>>> but it's slower than >>>>> - Q2 with cache >>>>> however 3 ms (assuming those timings are in seconds) with all the >>>>> moving parts of a memcache instance seems unfair to blame entirely on >>>>> web2py >>>>> >>>>> >>>>> >>>>> On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote: >>>>>> >>>>>> Dear all, I want to share with you some results. I implemented a >>>>>> query in two different ways. Given the following common code: >>>>>> start = db.record.with_alias('start_**point') >>>>>> end = db.record.with_alias('end_**point') >>>>>> elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch() >>>>>> >>>>>> The first query is (the constrain is in the query): >>>>>> rows = db( query & (elapsed_time < 86400) ).select( >>>>>> start.ALL, >>>>>> end.ALL, >>>>>> start.gathered_on.epoch(), >>>>>> end.gathered_on.epoch(), >>>>>> elapsed_time, >>>>>> orderby=start.gathered_on.**epoch(), >>>>>> left=start.on( (start.mac == end.mac) & (start.gathered_on < >>>>>> end.gathered_on)), >>>>>> cache=(cache.memcache, 3600), >>>>>> cacheable = True >>>>>> ) >>>>>> The second one is (the constrain is explicitly tested latter): >>>>>> rows = db( query ).select( >>>>>> start.ALL, >>>>>> end.ALL, >>>>>> start.gathered_on.epoch(), >>>>>> end.gathered_on.epoch(), >>>>>> elapsed_time, >>>>>> orderby=start.gathered_on.**epoch(), >>>>>> left=start.on( (start.mac == end.mac) & (start.gathered_on < >>>>>> end.gathered_on)), >>>>>> cache=(cache.memcache, 3600), >>>>>> cacheable = True >>>>>> ) >>>>>> rows2 = [r for r in rows if (r.end_point.gathered_on - >>>>>> r.start_point.gathered_on < datetime.timedelta(days=1)) ] >>>>>> >>>>>> From the timing results I got that the second query is always faster, >>>>>> with or without cache: >>>>>> Q_1: 0.273243904114 >>>>>> Q_1 with cache: 0.0182011127472 >>>>>> Q_2: 0.250607967377 >>>>>> Q_2 with cache: 0.0158171653748 >>>>>> >>>>>> Beside the fact that they are just a few milliseconds of difference >>>>>> and that all the rows satisfy the constrain, what is not clear to me is >>>>>> why >>>>>> even when the cache is enabled the first query is taking longer. The >>>>>> question that came to my mind is about computed columns, are they cached? >>>>>> >>>>>> Paolo >>>>>> >>>>> -- >>>>> >>>>> --- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "web2py-users" group. >>>>> To unsubscribe from this topic, visit >>>>> https://groups.google.com/d/topic/web2py/9RfFmNQc-tE/unsubscribe?hl=en >>>>> . >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> web2py+un...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>>> >>>>> >>>> >>>> >>> -- --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.