Hi, I was already investigating the possibility to split the RelStorage packing process up into smaller chunks.
Due to the expected load on the Oracle cluster during a pack, we'll have to run the pack at night and want to be absolutely certain that database is ready for normal site operations again the next day. With a 40+GB database (hasn't been packed for it's entire run, more than 2 years now) we are not confident packing will be done in one night. To at least get a handle on how much work the packing is going to be, and to have a nice stopping point, I looked at splitting pre-pack and pack operations out into two separate steps. To my delight I saw that the 1.5.0 beta already implements basically running only the pre-pack phase (the --dry-run option). From there I created the attached patch, one that renames the dry-run op into a 'prepack only' option, and adds another option to skip the pre-pack and just use whatever is present in the pack tables. I haven't yet actually run this code, but the change isn't big. I didn't find any relevant tests to update. Anyone want to venture some feedback? Helge Tesdal and I also looked into the pack operation itself, and how it uses a duty cycle to give other transactions a chance to commit during pack. We think there might be a better pattern to handle the locking. Currently, with the default values, the pack operation will hold the commit lock for 5 seconds, pack, then release the lock for 5 more seconds, repeating until done. With various options you can alter these timings, but the basic principle is the same. For Oracle, where the commit lock has a time-out, this means that packing can fail because the commit lock times out. For all backends, Oracle or otherwise, commits elsewhere on a site cluster will have to wait long periods of time before they can proceed, leading to severe delays on a heavily trafficked website. With the variable time-out for requesting a commit lock on Oracle however, there is a different option. I do not know if MySQL and Postgres can support this too, I haven't looked into their lock acquisition options, but the following relies on lock acquisition timeouts. Consider the following packing algorithm: * Use a short timeout (say 1 second) to request the commit lock. * If it doesn't time out: * run one batch update cycle (up to 100 transactions processed). * optionally clean out associated blobs * unlock * loop back up * If it does time out: * commit lock is busy, so back off by sleeping a bit * loop back up By timing out the lock request quickly, you give commits from non-packing zope transactions right of way. Packing truly becomes a non-intrusive background operation. Is this a viable scenario? -- Martijn Pieters
twophasepack.patch
Description: Binary data
_______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev