I'm expiriencing some strange bugs which seem to be caused by connections 
used by Sqlalchemy, which i can't pin down exactly.. i was hoping someone 
has a clue whats going on here.

We're working on a Pyramid (version 1.5b1) webapplication and use 
Sqlalchemy (version 0.9.6) for all our database connectivity. Sometimes we 
get errors related to the db connection or session, most of the time this 
would be a "cursor already closed" or "This Connection is closed" error, 
but we get other related exceptions too:

(OperationalError) connection pointer is NULL
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session, and no contextual session 
is established; lazy load operation of attribute '...' cannot proceed
A conflicting state is already present in the identity map for key (<class 
'...'>, (1001L,))
This Connection is closed (original cause: ResourceClosedError: This 
Connection is closed)
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session; lazy load operation of 
attribute '...' cannot proceed
Parent instance <...> is not bound to a Session, and no contextual session 
is established; lazy load operation of attribute '...' cannot proceed
'NoneType' object has no attribute 'twophase'
(OperationalError) connection pointer is NULL
This session is in 'prepared' state; no further


There is no silver bullet to reproduce them, only by refreshing many times 
they are bound to happen one at some point. So i made a script using 
multi-mechanize to spam different urls concurrently and see where and when 
it happens.

It appears the url triggered doesn't really matter, the errors happen when 
there are concurrent requests that span a longer time (and other requests 
get served in between). This seems to indicate there is some kind of 
threading problem; that either the session or connection is shared among 
different threads.

After googling for these issues I found a lot of topics, most of them tell 
to use scoped sessions, but the thing is we do use them already:

db_session = 
scoped_session(sessionmaker(extension=ZopeTransactionExtension(), 
autocommit=False, autoflush=False))
db_meta = MetaData()

- We have a BaseModel for all our orm objects:
BaseModel = declarative_base(cls=BaseModelObj, metaclass=BaseMeta, 
metadata=db_meta)

- We use the pyramid_tm tween to handle transactions during the request

- We hook db_session.remove() to the pyramid NewResponse event (which is 
fired after everything has run). I also tried putting it in a seperate 
tween running after pyramid_tm or even not doing it at all, none of these 
seem to have effect, so the response event seemed like the most clean place 
to put it.

- We create the engine in our main entrypoint of our pyramid project and 
use a NullPool and leave connection pooling to pgbouncer. We also configure 
the session and the bind for our BaseModel here:
engine = engine_from_config(config.registry.settings, 'sqlalchemy.', 
poolclass=NullPool)
db_session.configure(bind=engine, query_cls=FilterQuery)
BaseModel.metadata.bind = engine
config.add_subscriber(cleanup_db_session, NewResponse)
return config.make_wsgi_app()

- In our app we access all db operation using:
from project.db import db_session
...
db_session.query(MyModel).filter(...)
db_session.execute(...)

- We use psycopg2==2.5.2 to handle the connection to postgres with 
pgbouncer in between

- I made sure no references to db_session or connections are saved anywhere 
(which could result in other threads reusing them)

I also tried the spamming test using different webservers, using waitress 
and cogen i got the errors very easily, using wsgiref we unsurprisingly 
have no errors (which is singlethreaded). Using uwsgi and gunicorn (4 
workers, gevent) i didn't get any errors.

Given the differences in the webserver used, I thought it either has to do 
with some webservers handling requests in threads and some using new 
processes (maybe a forking problem)?
But to complicate matters even more, when time went on and i did some new 
tests, the problem had gone away in waitress but now happened with gunicorn 
(when using gevent)! I have no clue on how to go debugging this...

Finally, to test what happens to the connection, i attached an attribute to 
the connection at the start of the cursor execute and tried to read the 
attribute out at the end of the execute:
@event.listens_for(Engine, "before_cursor_execute")
def _before_cursor_execute(conn, cursor, stmt, params, context, execmany):
  conn.pdtb_start_timer = time.time()

@event.listens_for(Engine, "after_cursor_execute")
def _after_cursor_execute(conn, cursor, stmt, params, context, execmany):
  print conn.pdtb_start_timer

Surprisingly this sometimes raised an exception: 'Connection' object has no 
attribute 'pdtb_start_timer'

Which struck me as very strange.. I found one discussion about something 
similar: https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ
And tried adding strategy='threadlocal' to the engine, which from what i 
understand should force 1 connection for the tread. But it didn't have any 
effect on the errors im seeing.. (besides some unittests failing because i 
need two different sessions/connections for some tests and this forces 1 
connection to be associated)

Does anyone have any idea what might go on here or have some more pointers 
on how to attack this problem?

Thanks in advance!

Matthijs Blaas

PS: sorry for the crosspost 
(http://stackoverflow.com/questions/25768428/sqlalchemy-connection-errors), 
i hope i might catch some experts here that otherwise wouldn't see the 
stackoverflow post

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to