Hi there,

I spoke with zzzeek_ on IRC yesterday re: some code I'd written for an
introspective cascading delete function. We were previously using the ORM to do
this for us but, due to the way it works, it was taking several minutes to
delete large amounts of second-generation orphans. The code I've written
recursively gets all the tables involved, analyses the relationships and issues
as few delete statements as possible (I hope). I've attached the code, or
there's a link to it here:
http://paste.pocoo.org/show/82878/

With parent->child relationships my tests are showing this working much, much
faster than the ORM (by a factor of at least 100x). I haven't had a chance to
set up more complicated tables to fully test the recursive aspect of it but it
has worked with what I've given it so far.

I'm really enjoying working with SQLAlchemy, you guys have done a really good
job. If you think there's room for something like this in SA then it's all
yours. :-)

Cheers,
-- 
--------------------------
Bob Farrell
pH, an Experian Company
www.phgroup.com
Office Line: 020 7598 0310
Fax: 020 7598 0311
--------------------------

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

from sqlalchemy import class_mapper, select
from sqlalchemy.sql import delete

def delete_cascade(orm_obj):
    """Perform a cascading delete on any ORM object and its children."""
# Since we take an ORM _object_, we need to discover its table:
    obj_table = class_mapper(type(orm_obj)).mapped_table
    def get_child_tables(parent_table, children=[]):
        """Recursively find all child tables."""
        new_children = []
# Use SQLAlchemy's table_iterator reversed to give us the tables in the
# correct order to ensure that we can delete without breaking any constraints
# (i.e. we will not delete a parent before its child:
        for table in obj_table.metadata.table_iterator(reverse=True):
            for fk in table.foreign_keys:
                if fk.references(parent_table) and \
                (table, fk, parent_table) not in children:
                    new_children.append((table, fk, parent_table))
                    break
# If no new children are found we have reached the top of the recursion so we
# fall back down the stack:
        if not new_children:
            return []
        else:
            for child in new_children:
# Here is the recursive call:
                children.extend(get_child_tables(child[0]))
            children.extend(new_children)
            return children
    _children = get_child_tables(obj_table)
    children = []
# This loop filters out any tables who have more than one foreign key where one
# of the foreign keys references the root node so we have no duplicates. The
# result is a list of tables that reference either the root node or their
# parent:
    for child in _children:
        if child[0] not in [x[0] for x in children]:
            children.append(child)
        elif child[1].references(obj_table):
            for i, _child in enumerate(children):
                if _child[0] == child[0]:
                    children[i] = child
                    break
# This is a rare-case optimisation that sees if any of the tables reference the
# root node indirectly by having a foreign key whose counterpart is a direct
# reference to the root node:
    for child in children:
        table, fk, parent_table = child
        if not fk.references(obj_table):
            parent_fk = fk.column.foreign_key
            while parent_fk is not None:
                if parent_fk.references(obj_table):
                    obj_column = (
                        parent_fk.column.key
                    )
                    break
                parent_fk = parent_fk.column.foreign_key
# Finally build a select for grandchildren or later to establish which records
# need to be removed by seeing which of their parent's records are ancestors of
# the root node:
            if parent_fk is None:
                sel = select([fk.parent])
                parent_fk = fk.column.foreign_key
                while parent_fk is not None:
                    sel.append_whereclause(
                        parent_fk.parent==parent_fk.column
                        )
                    tmp = parent_fk.column.foreign_key
                    if tmp is not None:
                        parent_fk = tmp
                    else:
                        break
                obj_column = (
                    parent_fk.column.key
                    )
                sel.append_whereclause(
                    parent_fk.column==getattr(orm_obj, obj_column)
                    )
                in_column = fk.column.key
                yield delete(
                    fk.parent.table,
                    fk.parent.in_(sel)
                    )
                continue
# Otherwise simply yield a delete statement to delete the first-generation
# child of the root node:
        else:
            obj_column = fk.column.key
        yield delete(
            table,
            fk.parent==getattr(orm_obj, obj_column)
            )
# Build the delete statement for the root node itself by introspectively
# discovering the primary keys of the root node's table and deleting a 
# single record from this table (i.e. the root node):
    pk = [getattr(orm_obj, x) for x in obj_table.primary_key.keys()]
    pk_cols = [x for x in obj_table.c if x.primary_key]
    cond = pk[0] == pk_cols[0]
    for x, y in zip(pk[1:], pk_cols[1:]):
        if x and y:
            cond &= x == y
    yield delete(
        obj_table,
        cond
        )


Reply via email to