Re: [web2py] Re: Cache performance with computed columns

Niphlod Mon, 25 Mar 2013 04:39:01 -0700

so, q1 is now always faster than q2 ... seems that the bits you were 
missing are in fact the ones that auto-set the cache-key based on the query 
string....


cache.ram is always faster than anything else, cause it stores a pointer to 
the result, doesn't have to pickle/unpickle the result and can't have (by 
design) any network latency.
The problem is that it's a locking cache and single-process, so YMMV .

On Monday, March 25, 2013 12:21:31 PM UTC+1, Paolo valleri wrote:
>
> Thanks for the very clear explanation.
> I made the following:
> t0 = time.time()
> rows = cache.memcache ('q1', lambda : q1(query))
> t1 = time.time()
> rows2 = cache.memcache('q2', lambda: q2(query))
> rows2 = [r for r in rows2 if (r.end_point.gathered_on - 
> r.start_point.gathered_on < datetime.timedelta(days=1)) ]
> t2 = time.time() 
> print 'q1:', t1-t0
> print 'q2:', t2-t1 
> and I got from three different requests:
> q1: 0.0104899406433
> q2: 0.0106620788574
> q1: 0.011244058609
> q2: 0.0156059265137
> q1: 0.010370016098
> q2: 0.0133299827576
> Beside memcache, I tried with cache.ram getting:
> q1: 9.77516174316e-06
> q2: 0.00217413902283
> q1: 1.62124633789e-05
> q2: 0.00206780433655
> q1: 8.10623168945e-06
> q2: 0.00120091438293
> Given that, in a local env cache.ram is faster than memcache.
>
> Paolo
>
> On Monday, March 25, 2013 11:54:07 AM UTC+1, Niphlod wrote:
>>
>> well, when you plan to use cache you do it because:
>> a) the additional roundtrip is negligible
>> b) using the cache means your database is not used
>> I'll explain it further....
>> A query that takes 0.18ms to get "computed" on the database side can very 
>> well end up in 18sec as soon as your database gets "hit" by several 
>> requests.
>> A cached query that takes 0.18ms to get stored and 0.18ms to get 
>> retrieved (unpacked, decompressed, unpickled, choose your own "term" that 
>> suits the job) will take, if the cache backend is "proper", 0.36ms the 
>> first time and 0.18ms from the 2nd to the nth.
>> Even so, some complicated queries (or uncomplicated for you, but 
>> complicated for the db backend) can be speedied up just requesting less 
>> fields or computing in raw python something that can be usually done on the 
>> backend.
>> This kind of optimization (i.e. relying on python's datetime operations 
>> vs an extract(epoch from datefield) on the db) will be less and less useful 
>> as soon as the resultsets cardinality grows (i.e. the more rows you have 
>> the better chance for the db to get the results faster than having a basic 
>> query and processing the results in python). 
>> Assuming your "python filter" cuts out the 5% on the total of the rows, 
>> as soon as the number of rows grows you'll just waste time unpacking that 
>> additional 5%, and the time difference will be more explicit.
>> On a total unrelated note, remember that Python is fast, but dbs are fast 
>> too, especially when leveraged by all their features (e.g. you could set an 
>> index on the "computed" epoch and that query will probably gain a lot of 
>> speedup).
>> This is meant to explain why Q2 is faster than Q1. 
>> On the reason why Q1 with cache isn't faster than Q2 with cache, since 
>> the time difference is negligible, it's possible that the time difference 
>> gets "spent" on calculating the automatic key for the cache of the query 
>> (on the first implementation needs to add the "elapsed time" part) . Just 
>> to be sure, try timing
>>
>> def Q1():
>>     your first implementation, without cache
>> def Q2():
>>     your second implementation, without cache, without the check inpython
>>
>> with 
>> rows1 = cache('q1', lambda : Q1())
>>
>>
>> ..... and, separately
>> rows2 = cache('q2', lambda: Q2())
>> rows2 = [r for row in rows ...]
>>
>>
>>
>>
>> On Monday, March 25, 2013 11:12:50 AM UTC+1, Paolo valleri wrote:
>>>
>>> Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just 
>>> sharing with you all these results.
>>> In both the cases I have the time constrain, in one case it is in the 
>>> query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1 
>>> is no longer checking the constrain while Q2 is still checking all the 
>>> constrain for all rows. Moreover, computing the .epoch() should take a well 
>>> defined amount of time but for queries without cache, with cache it should 
>>> be stored the computed result.
>>>
>>> To sum up, when cache is enabled, with the same amount of rows (nearly 
>>> 300): 
>>> - Q2 is taking less time than Q1 even if it has to 
>>> compute explicitly the time constrain
>>>  - Q1 is taking more time than Q2, but it has to retrieve from cache 
>>> only one field more than Q1 among a dozen of fields
>>>
>>> Given that, retrieving more data from memcached can be slower than 
>>> re-computing it for each request.
>>>
>>>
>>>  Paolo
>>>
>>>
>>> 2013/3/25 paolo....@gmail.com <paolo....@gmail.com>
>>>
>>>>
>>>>
>>>>  Paolo
>>>>
>>>>
>>>> 2013/3/25 Niphlod <nip...@gmail.com>
>>>>
>>>>> I didn't get the point. ....
>>>>> Q1 is slower (obviously) cause of the epoch() function and the 
>>>>> additional filtering
>>>>> Q1 with cache shows more or less the same duration as Q2 with cache. 
>>>>> the key point is "even when the cache is enabled the first query is 
>>>>> taking longer" ... longer than what ?
>>>>> Q1 with cache is speedier that:
>>>>> - Q1 without cache, and that's ok
>>>>> - Q2 without cache, and that's ok
>>>>> but it's slower than
>>>>> - Q2 with cache 
>>>>> however 3 ms (assuming those timings are in seconds) with all the 
>>>>> moving parts of a memcache instance seems unfair to blame entirely on 
>>>>> web2py
>>>>>
>>>>>
>>>>>
>>>>> On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote:
>>>>>>
>>>>>> Dear all, I want to share with you some results. I implemented a 
>>>>>> query in two different ways. Given the following common code:
>>>>>> start = db.record.with_alias('start_**point')
>>>>>> end = db.record.with_alias('end_**point')
>>>>>> elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch()
>>>>>>
>>>>>> The first query is (the constrain is in the query):
>>>>>> rows = db( query & (elapsed_time  < 86400) ).select(
>>>>>> start.ALL, 
>>>>>> end.ALL, 
>>>>>>  start.gathered_on.epoch(),
>>>>>> end.gathered_on.epoch(),
>>>>>> elapsed_time,
>>>>>>  orderby=start.gathered_on.**epoch(),
>>>>>> left=start.on( (start.mac == end.mac) & (start.gathered_on < 
>>>>>> end.gathered_on)),
>>>>>>  cache=(cache.memcache, 3600),
>>>>>> cacheable = True
>>>>>> )
>>>>>> The second one is (the constrain is explicitly tested latter):
>>>>>> rows = db( query ).select(
>>>>>> start.ALL, 
>>>>>> end.ALL, 
>>>>>> start.gathered_on.epoch(),
>>>>>>  end.gathered_on.epoch(),
>>>>>> elapsed_time,
>>>>>> orderby=start.gathered_on.**epoch(),
>>>>>>  left=start.on( (start.mac == end.mac) & (start.gathered_on < 
>>>>>> end.gathered_on)),
>>>>>> cache=(cache.memcache, 3600),
>>>>>>  cacheable = True
>>>>>> )
>>>>>> rows2 = [r for r in rows if (r.end_point.gathered_on - 
>>>>>> r.start_point.gathered_on < datetime.timedelta(days=1)) ]
>>>>>>
>>>>>> From the timing results I got that the second query is always faster, 
>>>>>> with or without cache:
>>>>>> Q_1: 0.273243904114 
>>>>>> Q_1 with cache: 0.0182011127472
>>>>>> Q_2: 0.250607967377
>>>>>> Q_2 with cache: 0.0158171653748
>>>>>>
>>>>>> Beside the fact that they are just a few milliseconds of difference 
>>>>>> and that all the rows satisfy the constrain, what is not clear to me is 
>>>>>> why 
>>>>>> even when the cache is enabled the first query is taking longer. The 
>>>>>> question that came to my mind is about computed columns, are they cached?
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>>  -- 
>>>>>  
>>>>> --- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "web2py-users" group.
>>>>> To unsubscribe from this topic, visit 
>>>>> https://groups.google.com/d/topic/web2py/9RfFmNQc-tE/unsubscribe?hl=en
>>>>> .
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> web2py+un...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>  
>>>>>  
>>>>>
>>>>
>>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [web2py] Re: Cache performance with computed columns

Reply via email to