attr.py

Mike Bayer Sat, 28 Jan 2023 08:23:52 -0800

that particular routine runs exactly once when the engine is used the very 
first time.    I only see one traceback here that does not seem to indicate the 
lock is being called in a re-entrant fashion, so to look for deadlocks we would 
need to see tracebacks for all threads.


if there is some unusual effect of creating processes here, ensure this routine 
runs in the parent process exactly once.   if you have a create_engine() that 
is then being spun out into separate processes, do this init *before* you make 
any of those processes:

engine = create_engine(...)
engine.connect().close()   # <--- engine is initialized

span_new_processes()   # now span sub processes

next, within each child process, set up the engine for all new connections to 
start up before the child process does its work by calling 
.dispose(close=False), so that within the child process, any connections 
inherited from the parent are forgotten:

def child_process():
    engine.dispose(close=False)


https://docs.sqlalchemy.org/en/20/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork

next w/ airflow, I have not used airflow but the code you point to there looks 
like the old "pre-ping" recipe which is no longer needed, as SQLAlchemy now 
includes the pre-ping option.   Report an issue to apache airflow that they 
should support SQLAlchemy 1.4 and above and remove this old event handler.

Finally, when the lock does actually occur youd also want to look at what the 
actual database and/or proxy servers are up to, if these servers are blocking 
new connections from proceeding, that would also be a potential cause of what 
you are seeing.

On Sat, Jan 28, 2023, at 1:02 AM, Ping Zhang wrote:
> Hi all,
> 
> Our production system is experiencing a very strange behavior. At peak load, 
> we see some of our python processes were stuck due to: * <built-in method 
> __enter__ of _thread.lock object at remote 0x7f8e021156c0>  in* *with 
> self._exec_once_mutex in sqlalchemy/event/attr.py*, (here is the full stack 
> trace  <https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31>that 
> I got via gdb with python extension).  
> 
> Here are some high level information about production setup.
> 
> Python:                           3.7.9
> SQLAlchemy                 1.3.6
> PyMySQL                       1.0.2
> Connection str: *mysql+pymysql*://HOST:PORT/DB?connect_timeout=30&charset=utf8
> Docker container where the python process are running: Ubuntu 18.04.6
> VM host version where docker runs: amzn2
> 
> When creating the engine, we use NullPool as the pool_class:
>     engine_args['poolclass'] = NullPool
>     engine = create_engine(SQL_ALCHEMY_CONN, **engine_args)
>     setup_event_handlers 
> <https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>(engine)
>     Session = scoped_session(
>         sessionmaker(autocommit=False,
>                      autoflush=False,
>                      bind=engine,
>                      expire_on_commit=False))
> 
> # here is the setup_event_handlers code 
> <https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>.
> 
> There are 5 processes running running in the container. please see this link 
> <https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31?permalink_comment_id=4451451#gistcomment-4451451>
>  for the python process model running in the docker container. The stuck 
> python process is created by its parent process (which is also python) via: 
> `subprocess.Popen`, right after `subprocess.Popen`, there is a thread created 
> to read the log of the subprocess(please see here for more info 
> <https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31?permalink_comment_id=4451451#gistcomment-4451451>).
>  Reading the source code 
> <https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_6/lib/sqlalchemy/event/attr.py#L277>
>  of that stack trace, it looks like it tries to grab a lock, but i am not 
> sure why it would fail to grab the lock given that we only have one process 
> uses the session.
> 
> At the peak load, there are around ~200 docker containers running on a single 
> amzn2 host. From the metric from database side, we do see some connections 
> were dropped at the peak. I am not sure whether this would cause that 
> threading.lock issue but they correlate. *Or is it possible due to the  
> event_handlers  
> <https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>of
>  the engine and the session query in the stack trace try to grab this lock at 
> the same time*?
> 
> Potential related issues that I have checked: 
>  * Race condition in dialect initialization  
> <https://github.com/sqlalchemy/sqlalchemy/issues/6337> 
>  * A possible race condition between dialect initialization and query 
> execution <https://github.com/sqlalchemy/sqlalchemy/issues/6337>
> but i could not figure out why it happens in our system. We also tried to 
> reproduce in our stress test env, however, we could not reproduce it.
> 
> Could you please share some insights? Any suggestions would be really 
> appreciated.
> 
> Thanks,
> 
> Ping
> 
> 
> 
> 
> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/2979ec01-e3be-4c5d-b4aa-50a6f4fceee4n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/2979ec01-e3be-4c5d-b4aa-50a6f4fceee4n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/14fccc71-d2c0-49cc-8e99-a22e0ce54d03%40app.fastmail.com.

Re: [sqlalchemy] Seek for help - Python process stuck at _exec_once_mutex during first_connect in sqlalchemy/event/attr.py

Reply via email to