I can certainly bore you with specifics! We have growing index of URLs, on discovery a url is standardized into a "canonical" and stored into two tables:
class UrlRaw(DeclaredTable): __tablename__ = 'url_raw' id = Column(Integer, nullable=False, primary_key=True) url_raw = Column(Unicode, nullable=False) url_raw_md5 = Column(Unicode, nullable=False) url_standardized_id = Column(Integer, nullable=False, ForeignKey("url_standardized.id")) class UrlStandardized(DeclaredTable): __tablename__ = 'url_standardized' id = Column(Integer, nullable=False, primary_key=True) url = Column(Unicode, nullable=False) url_md5 = Column(Unicode, nullable=False) The package used to standardize URLs had a bug and caused a small percent of URLs to standardize incorrectly. After fixing the package, I audited 100MM+ `UrlRaw` objects. class AuditUrl(DeclaredTable): __tablename__ = 'url_standardized_audit_1' url_standardized_id = Column(Integer, nullable=False, ForeignKey("url_standardized.id"), primary_key=True) audit_result = Column(Boolean, nullable=False, default=None) # none=unprocessed, True=Fine, False=MigrationNeeded new_url = Column(Unicode, nullable=False) new_url_md5 = Column(Unicode, nullable=False) Around 150k were standardized incorrectly - 50k had brand new standardizations (yipee!) but 100k had standardizations already existing in `UrlStandardized`. So now I've got to go through all the audit results on a migration needed and do the following: * Determine if the UrlStandardized record is the earliest version of the new standardization (Target) or a latter one (Deprecated) * Potentially update some fields on the Target; mark Deprecateds for deletion * Transfer relationships of Deprecateds onto the Target The tough/annoying bit is migrating the relationships. The `UrlStandardized` object has about 30 relationships on it -- but some of them have a unique/primary key on the table. class UrlStandardized2Tag(DeclaredTable): __tablename__ = 'url_standardized_2_tag' __primarykey__ = ['url_standardized_id', 'tag_id'] url_standardized_id = Column(Integer, nullable=False, ForeignKey("url_standardized.id"), primary_key=True) tag_id = Column(Integer, nullable=False, ForeignKey("tag.id"), primary_key=True) In those situations, I need to iterate over all the items to migrate them -- either calculating which items need to transfer, or wrapping thing into a savepoint and allowing an integrity error. that's where it gets a bit annoying. Everything needs to be processed in ascending order too, because changes can cascade. This sort of stuff could be generalizable, but it requires a bit of work to get right. I've done enough custom migrations like this that I **should** have invested in a general tool already, but haven't. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.