It appears that the most efficient way to delete on app engine is to:
 - build a query object, like we are doing now
 - call run with keys_only=True 
(https://developers.google.com/appengine/docs/python/datastore/queryclass#Query_run)
 
which returns an iterator.
 - pass that iterator to the datastore delete method 
(https://developers.google.com/appengine/docs/python/datastore/functions#delete)

this avoids the cost of loading the rows into memory, decreases the 
likelihood of timeout, and has the cost of 1 datastore small operation per 
row.  but it does prevent us from getting a count of rows deleted.

the way we do it now:
 - run count() on the query.  this has a cost (time and money) of iterating 
over all the rows that match the query on GAE (1 datastore small operation 
per row)
 - run fetch(limit=1000) and call delete() successively until no more rows. 
 this has the cost of running a full query (at least 1 datastore read 
operation per row) and loading the result set into memory and then deleting 
the results.

in my case i'm timing out on the count() call so i don't even start the 
delete.  from an efficiency standpoint i'd rather have more rows deleted 
for less cost then get a count....but this may not be acceptable for all. 
 at a minimum i think we should switch to use keys_only=True for the fetch, 
and skip the leading count() call and just sum the number of times we call 
fetch.  we may also consider catching the datastore timeout error and 
trying to handle a partial delete more gracefully (or continue to let the 
user catch the error).

what is the "right" approach for web2py?  if the approach with count is 
correct, could i propose a gae bulk_delete method that does not return 
count but uses my first method?

thanks for the input!

cfh

On Saturday, October 20, 2012 7:58:56 AM UTC-7, Massimo Di Pierro wrote:
>
> Delete should return the number of deleted records. What is your proposal?
>
> On Wednesday, 17 October 2012 17:30:22 UTC-5, howesc wrote:
>>
>> Hi all,
>>
>> I'm trying to clean up old expired sessions.....but i waited a long time 
>> to get to this and now my GAE delete is just timing out.  Reading the GAE 
>> docs, there appears to be some improvements that we can make to the query 
>> delete method on GAE that will make it faster and cheaper.  what we lose 
>> then is the count of the number of rows deleted.
>>
>> my question is, does having a db(db.table.something==True).delete() that 
>> does not return a count break the web2py API contract, or break anyone's 
>> applications?
>>
>> thanks,
>>
>> christian
>>
>

-- 



Reply via email to