Re: [sqlalchemy] Problems handling an Amazon RDS Multi-AZ failover with SqlSoup and SqlAlchemy 0.7.8

JonathanQ Thu, 15 Nov 2012 09:28:52 -0800

Thanks Michael - I think I am starting to see where we have gone wrong with 
our session handling.  Looks like calling remove() on the session at the 
end of a unit of work is probably the best approach.

Here is a bit more detail on what these applications are doing - and 
perhaps you can comment on whether calling a remove() on the session at the 
end of the scope of work is the best action to take.

Application 1 runs 24/7 and consumes from a messaging queue.  It pulls 10 
messages off the queue and processes each one individually.  Upon 
completion of processing for all 10 messages - we will ACK the ones that 
succeeded and let the ones that failed be re-driven to another consumer 
(after a few attempts we put them aside in a DLQ).

The unit of work here could be each message, or the batch of 10.

Application 2 runs 24/7 and loops over a list of enums and does queries 
based on them.  For example: [blue, green, red, yellow].  We will loop over 
each and do a query for all records with color: blue, process them and then 
move on to the next.  If processing for a single color fails, it shouldn't 
stop the processing of the next one - and assumption is that the next time 
around the failing ones will work (or will work soon).  We sleep between 
outer loops for ~15 seconds.

The unit of work would be each color, or a single loop through all colors.

Currently in both applications we are not doing anything with the session 
after we finish processing the unit of work, so we are reusing the same 
session for all requests (for many days).  That is why when something goes 
wrong and rollback fails for whatever reason - our application gets into a 
bad state and can't process anything until it is restarted.  

Neither application is multi-threaded.  

Is there any performance hit in doing a session.remove() very often?  I am 
assuming not - after reading your reply I did some searches on 
session.remove() and that seems to be the regular behaivor for a web 
request.  We can safely break up the unit of work into either very small 
chunks to minimize impact of a db issue, or batch them together safely 
without much impact on data integrity if there is a performance impact. 
 Either way - re-using the same session for long periods of time (days) 
looks like its the root cause of our issues.

Thanks again for your help!

Jonathan

On Wednesday, November 14, 2012 5:17:07 PM UTC-8, Michael Bayer wrote:
>
>
> On Nov 14, 2012, at 2:38 PM, JonathanQ wrote: 
>
> > We have some code running against MySQL running in Amazon RDS.  During 
> an outage or the maintenance window - Amazon issues an automatic failover 
> of our RDS instance.  However our code doesn't failover with it. 
> > 
> > We are having issues with 2 of our applications (2 different issues): 
> > 
> > First one: Unable to rollback when we get the error "2006, 'MySQL server 
> has gone away'" 
> > 
> > The issue we are having is that a DB operation will fail when the 
> failover occurs.  So in the event of an error we issue a rollback on the 
> SqlSoup object (which delegates to the underlying session).   
> > 
> >   sqlsoup_engine.rollback() 
> > 
> > However this rollback fails with the error:   
> > 
> >     connection.rollback() 
> >   OperationalError: (OperationalError) (2006, 'MySQL server has gone 
> away') None None 
> > 
> > Subsequently our application continues on an then starts failing with: 
> >   StatementError("Can't reconnect until invalid transaction is rolled 
> back (original cause: InvalidRequestError: Can't reconnect until invalid 
> transaction is rolled back)",) 
>
> OK well when you say "in the event of an error", is that an error that 
> precedes the rollback, and is it the same 2006 error ?  or is the 2006 
> error new with the rollback? 
>
> Basically, what should happen is, any DB operation, like execute, or 
> commit, rollback, can potentially throw a 2006.  SQLAlchemy will detect 
> this as a "disconnect" situation, and discard all open connections that 
> aren't checked out.  The next operation will use a brand new connection. 
>
> So without knowing if you're getting this same error multiple times or 
> not, you'd want to either rollback() a second time, or just throw away the 
> Session after the rollback() fails.   With SQLSoup you'd want to get at it 
> by sqlsoup_engine.session.remove(), which will replace the current thread's 
> Session with a new one. 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/EQPTTqrtih0J.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Problems handling an Amazon RDS Multi-AZ failover with SqlSoup and SqlAlchemy 0.7.8

Reply via email to