Re: Partial rollouts

Atli Thorbjornsson Thu, 15 Jun 2017 12:06:55 -0700


On Thursday, June 15, 2017 at 5:49:17 PM UTC, mike bayer wrote:
>
>
>
> On 06/15/2017 11:22 AM, Atli Thorbjornsson wrote: 
> > Hi guys, our SaaS application relies heavily on Alembic for both schema 
> > and data migration. (Thanks!) 
> > 
> > We have a new requirement from our product team to support 
> > "partialrollouts and rollback". 
> > Up till this point upgrades to our service have been an "all or nothing" 
> > affair. Either all clients upgrade or none (and no rollbacks :)) 
> > 
> > We think the largest problem with this new approach will be training a 
> > new developer mindset but I'm wondering whether there any specific 
> > gotchas or tricks with Alembic that would be helpful. 
> > 
> > Conceptually segregating new code/data to specific clients of our 
> > platform is straightforward. 
> > Our system is multi-tenant in that all clients are already segregated 
> > into "orgs" that have no connection between themselves. 
> > (Just to be clear, we're running on a single db, the multi-tenancy is 
> > enforced by logic in code) 
> > So a partial rollout would mean deciding on what orgs receive the new 
> > code/data. 
> > 
> > The conclusion we reached is that, moving forward, all deployment must 
> > now be phased. An example: 
> > 
> > Say we have a table: conversation with fields (id, name, 
> last_message_id) 
> > Say we want to change it to (id, name, last_message_timestamp) 
> > 
> > Phase1: Via alembic add new column conversation.last_message_timestamp. 
> > Corresponding deployed code would now have to write to both 
> > last_message_id and last_message_timestamp while new behaviour would 
> > only display for "rolled out orgs". 
> > 
> > Phase2: Once partial rollout is deemed successful complete the 
> > rollout via alembic by removing conversation.last_message. Corresponding 
> > code now only writes to last_message_timestamp. 
> > 
> > Does that make sense, does anyone have any war stories they care to 
> share? 
>
> So, interesting here that your problemspace has this among multi-tenancy 
> and individual applications, as I have experience with this issue in 
> terms of a different application geometry. 
>
> There is a vast war story around this one, which is ongoing, and it is 
> that of Openstack.   With Openstack, we don't call it "partial rollout", 
> we call it "online upgrades".   Openstack applications generally have 
> horizontally clustered copies of services all talking to the same 
> columns/tables in a single database (where the database itself may be a 
> cluster like Galera, but still a single logical database).  The online 
> upgrade process follows what you're doing, in that the database is first 
> "expanded" to include new tables and columns, the online services are 
> one at a time upgraded to the new version, which now must refer to both 
> tables/columns explicitly in some way, and once all services are running 
> on the new system, the previous migration can be "contracted" to no 
> longer include the old structures. 
>
> To amend what you wrote, with Openstack, because different versions of 
> the service are talking to the *same* rows (which may be different than 
> your multi-tenancy), services have to also *read* from both 
> tables/columns as well as *write* to both, because an old version of the 
> service will continue to only write data into the old column. 
>
> Openstack is also organized into many different sub-products, with names 
> like Nova, Neutron, and Keystone.  Each product has its own development 
> team and its own database schema.   So within the realm of "expand" / 
> "contract" which is the general theme, each team is doing the "talk to 
> both schemas" idea totally differently. 
>
> First within the realm of Alembic, the Neutron project has the most 
> mature approach to this that I helped them develop, which is that they 
> use individual migration branches organized into subdirectories like this: 
>
> liberty/expand/ 
> liberty/contract/ 
> mitaka/expand/ 
> mitaka/contract/ 
> newton/expand/ 
> newton/contract/ 
>
> Where liberty, mitaka, newton are release names.  Each release has a 
> specific series of migration files that are only "expand" and a series 
> that are only "contract".  The branching and cross-dependency system 
> documented at http://alembic.zzzcomputing.com/en/latest/branches.html is 
> at the core of this approach and Neutron was one of the driving factors 
> for me to get the whole branching capability done.   Neutron also has 
> custom front-ends to Alembic and such which handle some of the other 
> complexities in running the right branches and creating new migrations. 
>   I would recommend that you look into the Alembic branching feature 
> along with multiple directories, multiple bases, and cross-dependencies 
> to work out your "partials". 
>
> Once you have expand/contract going, you have to get your applications 
> to work the right way, and IMO this is the really hard part.   The three 
> general ways seem to be: 
>
> 1. database migrations generally mean some totally new subsystem is 
> replacing some older subsystem.  In this case, the application can 
> abstract the "talk to the old" / "talk to the new" concept in terms of 
> accessing data from both subsystems.  This seems to be how Nova has 
> worked out most of their issues, and the separation occurs at the level 
> of old / new web service methods - they still have to read from both 
> services and copy data back and forth, but it's done at a higher level 
> than that of SQL statements. 
>
> 2. the application's SQL access layer needs to have fine grained "select 
> from old, new" types of logic pretty much everywhere a migration takes 
> place.   From my way of thinking, this is pretty much unavoidable once 
> you decide you want to migrate a database schema in a specific way that 
> there's no other way around this.   This indicates that while you can 
> certainly migrate your schema and upgrade applications to use new 
> concepts, you will *never* have a version of the application that isn't 
> having to mediate between a legacy version of a schema and a current 
> one, unless you put out a new release subsequent to your "online 
> upgrade" release that has no new schema migrations. 
>
> 3. Another way to handle this, which is what the Keystone project has 
> decided to do, is to move the whole "read/write from old/new 
> columns/tables" layer of functionality into database-level triggers. 
> In this approach, the migration that adds a new column for "expand" also 
> sets up a trigger to ensure the new column is synchronized with the old 
> one at all times.  This completely solves the problem of the application 
> having to worry about the old/new schema at all and you can cleanly 
> upgrade applications to the new one, where the database is essentially 
> providing both versions of the schema simultaneously.  The "contract" 
> migrations then remove the triggers. 
>
> I find the trigger approach to be the best way to do this - it 
> completely removes all complexity from the application and leverages 
> something the database already knows how to do.  However, I don't myself 
> have experience in actually running this, and issues of performance, 
> transaction integrity, or locking would be what might make this less 
> desirable. 
>
> When I tried to discuss this technique with the community, the Openstack 
> community as a whole recoiled in horror at the thought of using a 
> database trigger for anything.   I offered to help build up abstraction 
> layers to make this simple but there was a chorus of total terror and 
> panic from anyone that had ever had the word "DBA" in their job titles. 
> The reason offered were IMO not very satisfactory, mostly among the 
> realm of "we tried that and it was too complicated" - where "we tried 
> that" means, "we tried that *before we had SQLAlchemy to help automate 
> it* ;).   I can dig up the thread if you're interested. 
>
>  
Thank you so much for sharing Mike!


We had not thought of keeping old/new in sync with triggers. Sounds almost 
too good to be true not to have to muddle in application logic! 
You're correct in assuming our situation is a whole lot simpler than what 
you were dealing with so we'll definitely try this out.
I think will probably suffice for us in most cases.

By the way that knee-jerk reaction to triggers you described is pretty 
funny, people can get pretty hung up on their axioms :)
I'd love to read the thread if you have it handy!

 

> Nevertheless, Keystone went off and did it this way for their relatively 
> simple schema but they didn't ask me to help them on it. 
>
>
>
> So yes, this is a thing and I can refer you to lots of Openstack stuff 
> about it.  Your situation is probably much simpler than that of 
> Openstack though. 
>
>
>
>
> > 
> > Thanks, 
> > Atli Thorbjornsson. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "sqlalchemy-alembic" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to sqlalchemy-alembic+unsubscr...@googlegroups.com 
> <javascript:> 
> > <mailto:sqlalchemy-alembic+unsubscr...@googlegroups.com <javascript:>>. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy-alembic" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy-alembic+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Partial rollouts

Reply via email to