It appears that the most efficient way to delete on app engine is to: - build a query object, like we are doing now - call run with keys_only=True (https://developers.google.com/appengine/docs/python/datastore/queryclass#Query_run) which returns an iterator. - pass that iterator to the datastore delete method (https://developers.google.com/appengine/docs/python/datastore/functions#delete)
this avoids the cost of loading the rows into memory, decreases the likelihood of timeout, and has the cost of 1 datastore small operation per row. but it does prevent us from getting a count of rows deleted. the way we do it now: - run count() on the query. this has a cost (time and money) of iterating over all the rows that match the query on GAE (1 datastore small operation per row) - run fetch(limit=1000) and call delete() successively until no more rows. this has the cost of running a full query (at least 1 datastore read operation per row) and loading the result set into memory and then deleting the results. in my case i'm timing out on the count() call so i don't even start the delete. from an efficiency standpoint i'd rather have more rows deleted for less cost then get a count....but this may not be acceptable for all. at a minimum i think we should switch to use keys_only=True for the fetch, and skip the leading count() call and just sum the number of times we call fetch. we may also consider catching the datastore timeout error and trying to handle a partial delete more gracefully (or continue to let the user catch the error). what is the "right" approach for web2py? if the approach with count is correct, could i propose a gae bulk_delete method that does not return count but uses my first method? thanks for the input! cfh On Saturday, October 20, 2012 7:58:56 AM UTC-7, Massimo Di Pierro wrote: > > Delete should return the number of deleted records. What is your proposal? > > On Wednesday, 17 October 2012 17:30:22 UTC-5, howesc wrote: >> >> Hi all, >> >> I'm trying to clean up old expired sessions.....but i waited a long time >> to get to this and now my GAE delete is just timing out. Reading the GAE >> docs, there appears to be some improvements that we can make to the query >> delete method on GAE that will make it faster and cheaper. what we lose >> then is the count of the number of rows deleted. >> >> my question is, does having a db(db.table.something==True).delete() that >> does not return a count break the web2py API contract, or break anyone's >> applications? >> >> thanks, >> >> christian >> > --