I can certainly bore you with specifics!

We have growing index of URLs, on discovery a url is standardized into a 
"canonical" and stored into two tables:

    class UrlRaw(DeclaredTable):
         __tablename__ = 'url_raw'
         id = Column(Integer, nullable=False, primary_key=True)
         url_raw = Column(Unicode, nullable=False)
         url_raw_md5 = Column(Unicode, nullable=False)
         url_standardized_id = Column(Integer, nullable=False, 
ForeignKey("url_standardized.id"))

    class UrlStandardized(DeclaredTable):
         __tablename__ = 'url_standardized'
         id = Column(Integer, nullable=False, primary_key=True)
         url = Column(Unicode, nullable=False)
         url_md5 = Column(Unicode, nullable=False)

The package used to standardize URLs had a bug and caused a small percent 
of URLs to standardize incorrectly.  After fixing the package, I audited 
100MM+ `UrlRaw` objects.

     class AuditUrl(DeclaredTable):
         __tablename__ = 'url_standardized_audit_1'
         url_standardized_id = Column(Integer, nullable=False, 
ForeignKey("url_standardized.id"), primary_key=True)
         audit_result = Column(Boolean, nullable=False, default=None)  # 
none=unprocessed, True=Fine, False=MigrationNeeded
         new_url = Column(Unicode, nullable=False)
         new_url_md5 = Column(Unicode, nullable=False)

Around 150k were standardized incorrectly - 50k had brand new 
standardizations (yipee!) but 100k had standardizations already existing in 
`UrlStandardized`.

So now I've got to go through all the audit results on a migration needed 
and do the following:

* Determine if the UrlStandardized record is the earliest version of the 
new standardization (Target) or a latter one (Deprecated)
* Potentially update some fields on the Target; mark Deprecateds for 
deletion
* Transfer relationships of Deprecateds onto the Target

The tough/annoying bit is migrating the relationships.  The 
`UrlStandardized` object has about 30 relationships on it -- but some of 
them have a unique/primary key on the table.  

    class UrlStandardized2Tag(DeclaredTable):
         __tablename__ = 'url_standardized_2_tag'
        __primarykey__ = ['url_standardized_id', 'tag_id']
         url_standardized_id = Column(Integer, nullable=False, 
ForeignKey("url_standardized.id"), primary_key=True)
         tag_id = Column(Integer, nullable=False, ForeignKey("tag.id"), 
primary_key=True)

In those situations, I need to iterate over all the items to migrate them 
-- either calculating which items need to transfer, or wrapping thing into 
a savepoint and allowing an integrity error.  that's where it gets a bit 
annoying.  

Everything needs to be processed in ascending order too, because changes 
can cascade.  

This sort of stuff could be generalizable, but it requires a bit of work to 
get right.  I've done enough custom migrations like this that I **should** 
have invested in a general tool already, but haven't.  


-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to