Re: [web2py] Re: Cache performance with computed columns
so, q1 is now always faster than q2 ... seems that the bits you were missing are in fact the ones that auto-set the cache-key based on the query string cache.ram is always faster than anything else, cause it stores a pointer to the result, doesn't have to pickle/unpickle the result and can't have (by design) any network latency. The problem is that it's a locking cache and single-process, so YMMV . On Monday, March 25, 2013 12:21:31 PM UTC+1, Paolo valleri wrote: > > Thanks for the very clear explanation. > I made the following: > t0 = time.time() > rows = cache.memcache ('q1', lambda : q1(query)) > t1 = time.time() > rows2 = cache.memcache('q2', lambda: q2(query)) > rows2 = [r for r in rows2 if (r.end_point.gathered_on - > r.start_point.gathered_on < datetime.timedelta(days=1)) ] > t2 = time.time() > print 'q1:', t1-t0 > print 'q2:', t2-t1 > and I got from three different requests: > q1: 0.0104899406433 > q2: 0.0106620788574 > q1: 0.011244058609 > q2: 0.0156059265137 > q1: 0.010370016098 > q2: 0.0133299827576 > Beside memcache, I tried with cache.ram getting: > q1: 9.77516174316e-06 > q2: 0.00217413902283 > q1: 1.62124633789e-05 > q2: 0.00206780433655 > q1: 8.10623168945e-06 > q2: 0.00120091438293 > Given that, in a local env cache.ram is faster than memcache. > > Paolo > > On Monday, March 25, 2013 11:54:07 AM UTC+1, Niphlod wrote: >> >> well, when you plan to use cache you do it because: >> a) the additional roundtrip is negligible >> b) using the cache means your database is not used >> I'll explain it further >> A query that takes 0.18ms to get "computed" on the database side can very >> well end up in 18sec as soon as your database gets "hit" by several >> requests. >> A cached query that takes 0.18ms to get stored and 0.18ms to get >> retrieved (unpacked, decompressed, unpickled, choose your own "term" that >> suits the job) will take, if the cache backend is "proper", 0.36ms the >> first time and 0.18ms from the 2nd to the nth. >> Even so, some complicated queries (or uncomplicated for you, but >> complicated for the db backend) can be speedied up just requesting less >> fields or computing in raw python something that can be usually done on the >> backend. >> This kind of optimization (i.e. relying on python's datetime operations >> vs an extract(epoch from datefield) on the db) will be less and less useful >> as soon as the resultsets cardinality grows (i.e. the more rows you have >> the better chance for the db to get the results faster than having a basic >> query and processing the results in python). >> Assuming your "python filter" cuts out the 5% on the total of the rows, >> as soon as the number of rows grows you'll just waste time unpacking that >> additional 5%, and the time difference will be more explicit. >> On a total unrelated note, remember that Python is fast, but dbs are fast >> too, especially when leveraged by all their features (e.g. you could set an >> index on the "computed" epoch and that query will probably gain a lot of >> speedup). >> This is meant to explain why Q2 is faster than Q1. >> On the reason why Q1 with cache isn't faster than Q2 with cache, since >> the time difference is negligible, it's possible that the time difference >> gets "spent" on calculating the automatic key for the cache of the query >> (on the first implementation needs to add the "elapsed time" part) . Just >> to be sure, try timing >> >> def Q1(): >> your first implementation, without cache >> def Q2(): >> your second implementation, without cache, without the check inpython >> >> with >> rows1 = cache('q1', lambda : Q1()) >> >> >> . and, separately >> rows2 = cache('q2', lambda: Q2()) >> rows2 = [r for row in rows ...] >> >> >> >> >> On Monday, March 25, 2013 11:12:50 AM UTC+1, Paolo valleri wrote: >>> >>> Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just >>> sharing with you all these results. >>> In both the cases I have the time constrain, in one case it is in the >>> query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1 >>> is no longer checking the constrain while Q2 is still checking all the >>> constrain for all rows. Moreover, computing the .epoch() should take a well >>> defined amount of time but for queries without cache, with cache it should >>> be stored the computed result. >>> >>> To sum up, when cache is enabled, with the same amount of rows (nearly >>> 300): >>> - Q2 is taking less time than Q1 even if it has to >>> compute explicitly the time constrain >>> - Q1 is taking more time than Q2, but it has to retrieve from cache >>> only one field more than Q1 among a dozen of fields >>> >>> Given that, retrieving more data from memcached can be slower than >>> re-computing it for each request. >>> >>> >>> Paolo >>> >>> >>> 2013/3/25 paolo@gmail.com >>> Paolo 2013/3/25 Niphlod > I didn't get the point. > Q1 is slower (obv
Re: [web2py] Re: Cache performance with computed columns
Thanks for the very clear explanation. I made the following: t0 = time.time() rows = cache.memcache ('q1', lambda : q1(query)) t1 = time.time() rows2 = cache.memcache('q2', lambda: q2(query)) rows2 = [r for r in rows2 if (r.end_point.gathered_on - r.start_point.gathered_on < datetime.timedelta(days=1)) ] t2 = time.time() print 'q1:', t1-t0 print 'q2:', t2-t1 and I got from three different requests: q1: 0.0104899406433 q2: 0.0106620788574 q1: 0.011244058609 q2: 0.0156059265137 q1: 0.010370016098 q2: 0.0133299827576 Beside memcache, I tried with cache.ram getting: q1: 9.77516174316e-06 q2: 0.00217413902283 q1: 1.62124633789e-05 q2: 0.00206780433655 q1: 8.10623168945e-06 q2: 0.00120091438293 Given that, in a local env cache.ram is faster than memcache. Paolo On Monday, March 25, 2013 11:54:07 AM UTC+1, Niphlod wrote: > > well, when you plan to use cache you do it because: > a) the additional roundtrip is negligible > b) using the cache means your database is not used > I'll explain it further > A query that takes 0.18ms to get "computed" on the database side can very > well end up in 18sec as soon as your database gets "hit" by several > requests. > A cached query that takes 0.18ms to get stored and 0.18ms to get retrieved > (unpacked, decompressed, unpickled, choose your own "term" that suits the > job) will take, if the cache backend is "proper", 0.36ms the first time and > 0.18ms from the 2nd to the nth. > Even so, some complicated queries (or uncomplicated for you, but > complicated for the db backend) can be speedied up just requesting less > fields or computing in raw python something that can be usually done on the > backend. > This kind of optimization (i.e. relying on python's datetime operations vs > an extract(epoch from datefield) on the db) will be less and less useful as > soon as the resultsets cardinality grows (i.e. the more rows you have the > better chance for the db to get the results faster than having a basic > query and processing the results in python). > Assuming your "python filter" cuts out the 5% on the total of the rows, as > soon as the number of rows grows you'll just waste time unpacking that > additional 5%, and the time difference will be more explicit. > On a total unrelated note, remember that Python is fast, but dbs are fast > too, especially when leveraged by all their features (e.g. you could set an > index on the "computed" epoch and that query will probably gain a lot of > speedup). > This is meant to explain why Q2 is faster than Q1. > On the reason why Q1 with cache isn't faster than Q2 with cache, since the > time difference is negligible, it's possible that the time difference gets > "spent" on calculating the automatic key for the cache of the query (on the > first implementation needs to add the "elapsed time" part) . Just to be > sure, try timing > > def Q1(): > your first implementation, without cache > def Q2(): > your second implementation, without cache, without the check in python > > with > rows1 = cache('q1', lambda : Q1()) > > > . and, separately > rows2 = cache('q2', lambda: Q2()) > rows2 = [r for row in rows ...] > > > > > On Monday, March 25, 2013 11:12:50 AM UTC+1, Paolo valleri wrote: >> >> Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just >> sharing with you all these results. >> In both the cases I have the time constrain, in one case it is in the >> query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1 >> is no longer checking the constrain while Q2 is still checking all the >> constrain for all rows. Moreover, computing the .epoch() should take a well >> defined amount of time but for queries without cache, with cache it should >> be stored the computed result. >> >> To sum up, when cache is enabled, with the same amount of rows (nearly >> 300): >> - Q2 is taking less time than Q1 even if it has to compute explicitly the >> time constrain >> - Q1 is taking more time than Q2, but it has to retrieve from cache only >> one field more than Q1 among a dozen of fields >> >> Given that, retrieving more data from memcached can be slower than >> re-computing it for each request. >> >> >> Paolo >> >> >> 2013/3/25 paolo@gmail.com >> >>> >>> >>> Paolo >>> >>> >>> 2013/3/25 Niphlod >>> I didn't get the point. Q1 is slower (obviously) cause of the epoch() function and the additional filtering Q1 with cache shows more or less the same duration as Q2 with cache. the key point is "even when the cache is enabled the first query is taking longer" ... longer than what ? Q1 with cache is speedier that: - Q1 without cache, and that's ok - Q2 without cache, and that's ok but it's slower than - Q2 with cache however 3 ms (assuming those timings are in seconds) with all the moving parts of a memcache instance seems unfair to blame entirely on web2py On Monday, March
Re: [web2py] Re: Cache performance with computed columns
well, when you plan to use cache you do it because: a) the additional roundtrip is negligible b) using the cache means your database is not used I'll explain it further A query that takes 0.18ms to get "computed" on the database side can very well end up in 18sec as soon as your database gets "hit" by several requests. A cached query that takes 0.18ms to get stored and 0.18ms to get retrieved (unpacked, decompressed, unpickled, choose your own "term" that suits the job) will take, if the cache backend is "proper", 0.36ms the first time and 0.18ms from the 2nd to the nth. Even so, some complicated queries (or uncomplicated for you, but complicated for the db backend) can be speedied up just requesting less fields or computing in raw python something that can be usually done on the backend. This kind of optimization (i.e. relying on python's datetime operations vs an extract(epoch from datefield) on the db) will be less and less useful as soon as the resultsets cardinality grows (i.e. the more rows you have the better chance for the db to get the results faster than having a basic query and processing the results in python). Assuming your "python filter" cuts out the 5% on the total of the rows, as soon as the number of rows grows you'll just waste time unpacking that additional 5%, and the time difference will be more explicit. On a total unrelated note, remember that Python is fast, but dbs are fast too, especially when leveraged by all their features (e.g. you could set an index on the "computed" epoch and that query will probably gain a lot of speedup). This is meant to explain why Q2 is faster than Q1. On the reason why Q1 with cache isn't faster than Q2 with cache, since the time difference is negligible, it's possible that the time difference gets "spent" on calculating the automatic key for the cache of the query (on the first implementation needs to add the "elapsed time" part) . Just to be sure, try timing def Q1(): your first implementation, without cache def Q2(): your second implementation, without cache, without the check in python with rows1 = cache('q1', lambda : Q1()) . and, separately rows2 = cache('q2', lambda: Q2()) rows2 = [r for row in rows ...] On Monday, March 25, 2013 11:12:50 AM UTC+1, Paolo valleri wrote: > > Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just > sharing with you all these results. > In both the cases I have the time constrain, in one case it is in the > query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1 > is no longer checking the constrain while Q2 is still checking all the > constrain for all rows. Moreover, computing the .epoch() should take a well > defined amount of time but for queries without cache, with cache it should > be stored the computed result. > > To sum up, when cache is enabled, with the same amount of rows (nearly > 300): > - Q2 is taking less time than Q1 even if it has to compute explicitly the > time constrain > - Q1 is taking more time than Q2, but it has to retrieve from cache only > one field more than Q1 among a dozen of fields > > Given that, retrieving more data from memcached can be slower than > re-computing it for each request. > > > Paolo > > > 2013/3/25 paolo@gmail.com > > > >> >> >> Paolo >> >> >> 2013/3/25 Niphlod > >> >>> I didn't get the point. >>> Q1 is slower (obviously) cause of the epoch() function and the >>> additional filtering >>> Q1 with cache shows more or less the same duration as Q2 with cache. >>> the key point is "even when the cache is enabled the first query is >>> taking longer" ... longer than what ? >>> Q1 with cache is speedier that: >>> - Q1 without cache, and that's ok >>> - Q2 without cache, and that's ok >>> but it's slower than >>> - Q2 with cache >>> however 3 ms (assuming those timings are in seconds) with all the moving >>> parts of a memcache instance seems unfair to blame entirely on web2py >>> >>> >>> >>> On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote: Dear all, I want to share with you some results. I implemented a query in two different ways. Given the following common code: start = db.record.with_alias('start_**point') end = db.record.with_alias('end_**point') elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch() The first query is (the constrain is in the query): rows = db( query & (elapsed_time < 86400) ).select( start.ALL, end.ALL, start.gathered_on.epoch(), end.gathered_on.epoch(), elapsed_time, orderby=start.gathered_on.**epoch(), left=start.on( (start.mac == end.mac) & (start.gathered_on < end.gathered_on)), cache=(cache.memcache, 3600), cacheable = True ) The second one is (the constrain is explicitly tested latter): rows = db( query ).select( start.ALL, end.ALL, start.gathered_on.epoch(), end.gathered
Re: [web2py] Re: Cache performance with computed columns
Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just sharing with you all these results. In both the cases I have the time constrain, in one case it is in the query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1 is no longer checking the constrain while Q2 is still checking all the constrain for all rows. Moreover, computing the .epoch() should take a well defined amount of time but for queries without cache, with cache it should be stored the computed result. To sum up, when cache is enabled, with the same amount of rows (nearly 300): - Q2 is taking less time than Q1 even if it has to compute explicitly the time constrain - Q1 is taking more time than Q2, but it has to retrieve from cache only one field more than Q1 among a dozen of fields Given that, retrieving more data from memcached can be slower than re-computing it for each request. Paolo 2013/3/25 paolo.vall...@gmail.com > > > Paolo > > > 2013/3/25 Niphlod > >> I didn't get the point. >> Q1 is slower (obviously) cause of the epoch() function and the additional >> filtering >> Q1 with cache shows more or less the same duration as Q2 with cache. >> the key point is "even when the cache is enabled the first query is >> taking longer" ... longer than what ? >> Q1 with cache is speedier that: >> - Q1 without cache, and that's ok >> - Q2 without cache, and that's ok >> but it's slower than >> - Q2 with cache >> however 3 ms (assuming those timings are in seconds) with all the moving >> parts of a memcache instance seems unfair to blame entirely on web2py >> >> >> >> On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote: >>> >>> Dear all, I want to share with you some results. I implemented a query >>> in two different ways. Given the following common code: >>> start = db.record.with_alias('start_**point') >>> end = db.record.with_alias('end_**point') >>> elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch() >>> >>> The first query is (the constrain is in the query): >>> rows = db( query & (elapsed_time < 86400) ).select( >>> start.ALL, >>> end.ALL, >>> start.gathered_on.epoch(), >>> end.gathered_on.epoch(), >>> elapsed_time, >>> orderby=start.gathered_on.**epoch(), >>> left=start.on( (start.mac == end.mac) & (start.gathered_on < >>> end.gathered_on)), >>> cache=(cache.memcache, 3600), >>> cacheable = True >>> ) >>> The second one is (the constrain is explicitly tested latter): >>> rows = db( query ).select( >>> start.ALL, >>> end.ALL, >>> start.gathered_on.epoch(), >>> end.gathered_on.epoch(), >>> elapsed_time, >>> orderby=start.gathered_on.**epoch(), >>> left=start.on( (start.mac == end.mac) & (start.gathered_on < >>> end.gathered_on)), >>> cache=(cache.memcache, 3600), >>> cacheable = True >>> ) >>> rows2 = [r for r in rows if (r.end_point.gathered_on - >>> r.start_point.gathered_on < datetime.timedelta(days=1)) ] >>> >>> From the timing results I got that the second query is always faster, >>> with or without cache: >>> Q_1: 0.273243904114 >>> Q_1 with cache: 0.0182011127472 >>> Q_2: 0.250607967377 >>> Q_2 with cache: 0.0158171653748 >>> >>> Beside the fact that they are just a few milliseconds of difference and >>> that all the rows satisfy the constrain, what is not clear to me is why >>> even when the cache is enabled the first query is taking longer. The >>> question that came to my mind is about computed columns, are they cached? >>> >>> Paolo >>> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "web2py-users" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/web2py/9RfFmNQc-tE/unsubscribe?hl=en. >> To unsubscribe from this group and all its topics, send an email to >> web2py+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [web2py] Re: Cache performance with computed columns
Paolo 2013/3/25 Niphlod > I didn't get the point. > Q1 is slower (obviously) cause of the epoch() function and the additional > filtering > Q1 with cache shows more or less the same duration as Q2 with cache. > the key point is "even when the cache is enabled the first query is taking > longer" ... longer than what ? > Q1 with cache is speedier that: > - Q1 without cache, and that's ok > - Q2 without cache, and that's ok > but it's slower than > - Q2 with cache > however 3 ms (assuming those timings are in seconds) with all the moving > parts of a memcache instance seems unfair to blame entirely on web2py > > > > On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote: >> >> Dear all, I want to share with you some results. I implemented a query in >> two different ways. Given the following common code: >> start = db.record.with_alias('start_**point') >> end = db.record.with_alias('end_**point') >> elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch() >> >> The first query is (the constrain is in the query): >> rows = db( query & (elapsed_time < 86400) ).select( >> start.ALL, >> end.ALL, >> start.gathered_on.epoch(), >> end.gathered_on.epoch(), >> elapsed_time, >> orderby=start.gathered_on.**epoch(), >> left=start.on( (start.mac == end.mac) & (start.gathered_on < >> end.gathered_on)), >> cache=(cache.memcache, 3600), >> cacheable = True >> ) >> The second one is (the constrain is explicitly tested latter): >> rows = db( query ).select( >> start.ALL, >> end.ALL, >> start.gathered_on.epoch(), >> end.gathered_on.epoch(), >> elapsed_time, >> orderby=start.gathered_on.**epoch(), >> left=start.on( (start.mac == end.mac) & (start.gathered_on < >> end.gathered_on)), >> cache=(cache.memcache, 3600), >> cacheable = True >> ) >> rows2 = [r for r in rows if (r.end_point.gathered_on - >> r.start_point.gathered_on < datetime.timedelta(days=1)) ] >> >> From the timing results I got that the second query is always faster, >> with or without cache: >> Q_1: 0.273243904114 >> Q_1 with cache: 0.0182011127472 >> Q_2: 0.250607967377 >> Q_2 with cache: 0.0158171653748 >> >> Beside the fact that they are just a few milliseconds of difference and >> that all the rows satisfy the constrain, what is not clear to me is why >> even when the cache is enabled the first query is taking longer. The >> question that came to my mind is about computed columns, are they cached? >> >> Paolo >> > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "web2py-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/web2py/9RfFmNQc-tE/unsubscribe?hl=en. > To unsubscribe from this group and all its topics, send an email to > web2py+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[web2py] Re: Cache performance with computed columns
I didn't get the point. Q1 is slower (obviously) cause of the epoch() function and the additional filtering Q1 with cache shows more or less the same duration as Q2 with cache. the key point is "even when the cache is enabled the first query is taking longer" ... longer than what ? Q1 with cache is speedier that: - Q1 without cache, and that's ok - Q2 without cache, and that's ok but it's slower than - Q2 with cache however 3 ms (assuming those timings are in seconds) with all the moving parts of a memcache instance seems unfair to blame entirely on web2py On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote: > > Dear all, I want to share with you some results. I implemented a query in > two different ways. Given the following common code: > start = db.record.with_alias('start_point') > end = db.record.with_alias('end_point') > elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch() > > The first query is (the constrain is in the query): > rows = db( query & (elapsed_time < 86400) ).select( > start.ALL, > end.ALL, > start.gathered_on.epoch(), > end.gathered_on.epoch(), > elapsed_time, > orderby=start.gathered_on.epoch(), > left=start.on( (start.mac == end.mac) & (start.gathered_on < > end.gathered_on)), > cache=(cache.memcache, 3600), > cacheable = True > ) > The second one is (the constrain is explicitly tested latter): > rows = db( query ).select( > start.ALL, > end.ALL, > start.gathered_on.epoch(), > end.gathered_on.epoch(), > elapsed_time, > orderby=start.gathered_on.epoch(), > left=start.on( (start.mac == end.mac) & (start.gathered_on < > end.gathered_on)), > cache=(cache.memcache, 3600), > cacheable = True > ) > rows2 = [r for r in rows if (r.end_point.gathered_on - > r.start_point.gathered_on < datetime.timedelta(days=1)) ] > > From the timing results I got that the second query is always faster, with > or without cache: > Q_1: 0.273243904114 > Q_1 with cache: 0.0182011127472 > Q_2: 0.250607967377 > Q_2 with cache: 0.0158171653748 > > Beside the fact that they are just a few milliseconds of difference and > that all the rows satisfy the constrain, what is not clear to me is why > even when the cache is enabled the first query is taking longer. The > question that came to my mind is about computed columns, are they cached? > > Paolo > -- --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.