Hi there, I spoke with zzzeek_ on IRC yesterday re: some code I'd written for an introspective cascading delete function. We were previously using the ORM to do this for us but, due to the way it works, it was taking several minutes to delete large amounts of second-generation orphans. The code I've written recursively gets all the tables involved, analyses the relationships and issues as few delete statements as possible (I hope). I've attached the code, or there's a link to it here: http://paste.pocoo.org/show/82878/
With parent->child relationships my tests are showing this working much, much faster than the ORM (by a factor of at least 100x). I haven't had a chance to set up more complicated tables to fully test the recursive aspect of it but it has worked with what I've given it so far. I'm really enjoying working with SQLAlchemy, you guys have done a really good job. If you think there's room for something like this in SA then it's all yours. :-) Cheers, -- -------------------------- Bob Farrell pH, an Experian Company www.phgroup.com Office Line: 020 7598 0310 Fax: 020 7598 0311 -------------------------- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---
from sqlalchemy import class_mapper, select from sqlalchemy.sql import delete def delete_cascade(orm_obj): """Perform a cascading delete on any ORM object and its children.""" # Since we take an ORM _object_, we need to discover its table: obj_table = class_mapper(type(orm_obj)).mapped_table def get_child_tables(parent_table, children=[]): """Recursively find all child tables.""" new_children = [] # Use SQLAlchemy's table_iterator reversed to give us the tables in the # correct order to ensure that we can delete without breaking any constraints # (i.e. we will not delete a parent before its child: for table in obj_table.metadata.table_iterator(reverse=True): for fk in table.foreign_keys: if fk.references(parent_table) and \ (table, fk, parent_table) not in children: new_children.append((table, fk, parent_table)) break # If no new children are found we have reached the top of the recursion so we # fall back down the stack: if not new_children: return [] else: for child in new_children: # Here is the recursive call: children.extend(get_child_tables(child[0])) children.extend(new_children) return children _children = get_child_tables(obj_table) children = [] # This loop filters out any tables who have more than one foreign key where one # of the foreign keys references the root node so we have no duplicates. The # result is a list of tables that reference either the root node or their # parent: for child in _children: if child[0] not in [x[0] for x in children]: children.append(child) elif child[1].references(obj_table): for i, _child in enumerate(children): if _child[0] == child[0]: children[i] = child break # This is a rare-case optimisation that sees if any of the tables reference the # root node indirectly by having a foreign key whose counterpart is a direct # reference to the root node: for child in children: table, fk, parent_table = child if not fk.references(obj_table): parent_fk = fk.column.foreign_key while parent_fk is not None: if parent_fk.references(obj_table): obj_column = ( parent_fk.column.key ) break parent_fk = parent_fk.column.foreign_key # Finally build a select for grandchildren or later to establish which records # need to be removed by seeing which of their parent's records are ancestors of # the root node: if parent_fk is None: sel = select([fk.parent]) parent_fk = fk.column.foreign_key while parent_fk is not None: sel.append_whereclause( parent_fk.parent==parent_fk.column ) tmp = parent_fk.column.foreign_key if tmp is not None: parent_fk = tmp else: break obj_column = ( parent_fk.column.key ) sel.append_whereclause( parent_fk.column==getattr(orm_obj, obj_column) ) in_column = fk.column.key yield delete( fk.parent.table, fk.parent.in_(sel) ) continue # Otherwise simply yield a delete statement to delete the first-generation # child of the root node: else: obj_column = fk.column.key yield delete( table, fk.parent==getattr(orm_obj, obj_column) ) # Build the delete statement for the root node itself by introspectively # discovering the primary keys of the root node's table and deleting a # single record from this table (i.e. the root node): pk = [getattr(orm_obj, x) for x in obj_table.primary_key.keys()] pk_cols = [x for x in obj_table.c if x.primary_key] cond = pk[0] == pk_cols[0] for x, y in zip(pk[1:], pk_cols[1:]): if x and y: cond &= x == y yield delete( obj_table, cond )