attr.py

Ping Zhang Mon, 30 Jan 2023 09:48:26 -0800

Thanks Mike for your kind response.

so to look for deadlocks we would need to see tracebacks for all threads.


Please see this link
<https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31?permalink_comment_id=4453633#gistcomment-4453633>,
there are 48 threads, lots of them are `blas_thread_server`, which I am not
sure about where it comes from.

For the engine management, it looks like if we are using the NullPool, then
it should be good
<https://docs.sqlalchemy.org/en/13/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork>.
I will double check that as well.

Regarding the airflow, it starts to support SQLAlchemy 1.4 and above
<https://docs.sqlalchemy.org/en/13/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork>
.

Thanks,

Ping


On Sat, Jan 28, 2023 at 8:23 AM Mike Bayer <
mike_not_on_goo...@zzzcomputing.com> wrote:

> that particular routine runs exactly once when the engine is used the very
> first time.    I only see one traceback here that does not seem to indicate
> the lock is being called in a re-entrant fashion, so to look for deadlocks
> we would need to see tracebacks for all threads.
>
> if there is some unusual effect of creating processes here, ensure this
> routine runs in the parent process exactly once.   if you have a
> create_engine() that is then being spun out into separate processes, do
> this init *before* you make any of those processes:
>
> engine = create_engine(...)
> engine.connect().close()   # <--- engine is initialized
>
> span_new_processes()   # now span sub processes
>
> next, within each child process, set up the engine for all new connections
> to start up before the child process does its work by calling
> .dispose(close=False), so that within the child process, any connections
> inherited from the parent are forgotten:
>
> def child_process():
>     engine.dispose(close=False)
>
>
>
> https://docs.sqlalchemy.org/en/20/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork
>
> next w/ airflow, I have not used airflow but the code you point to there
> looks like the old "pre-ping" recipe which is no longer needed, as
> SQLAlchemy now includes the pre-ping option.   Report an issue to apache
> airflow that they should support SQLAlchemy 1.4 and above and remove this
> old event handler.
>
> Finally, when the lock does actually occur youd also want to look at what
> the actual database and/or proxy servers are up to, if these servers are
> blocking new connections from proceeding, that would also be a potential
> cause of what you are seeing.
>
> On Sat, Jan 28, 2023, at 1:02 AM, Ping Zhang wrote:
>
> Hi all,
>
> Our production system is experiencing a very strange behavior. At peak
> load, we see some of our python processes were stuck due to: * <built-in
> method __enter__ of _thread.lock object at remote 0x7f8e021156c0>  in* *with
> self._exec_once_mutex in sqlalchemy/event/attr.py*, (here is the full
> stack trace
> <https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31>that I
> got via gdb with python extension).
>
> Here are some high level information about production setup.
>
> Python:                           3.7.9
> SQLAlchemy                 1.3.6
> PyMySQL                       1.0.2
> Connection str: *mysql+pymysql*
> ://HOST:PORT/DB?connect_timeout=30&charset=utf8
> Docker container where the python process are running: Ubuntu 18.04.6
> VM host version where docker runs: amzn2
>
> When creating the engine, we use NullPool as the pool_class:
>     engine_args['poolclass'] = NullPool
>     engine = create_engine(SQL_ALCHEMY_CONN, **engine_args)
>     setup_event_handlers
> <https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>
> (engine)
>     Session = scoped_session(
>         sessionmaker(autocommit=False,
>                      autoflush=False,
>                      bind=engine,
>                      expire_on_commit=False))
>
> # here is the setup_event_handlers code
> <https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>
> .
>
> There are 5 processes running running in the container. please see this
> link
> <https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31?permalink_comment_id=4451451#gistcomment-4451451>
>  for
> the python process model running in the docker container. The stuck python
> process is created by its parent process (which is also python) via:
> `subprocess.Popen`, right after `subprocess.Popen`, there is a thread
> created to read the log of the subprocess(please see here for more info
> <https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31?permalink_comment_id=4451451#gistcomment-4451451>).
> Reading the source code
> <https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_6/lib/sqlalchemy/event/attr.py#L277>
>  of
> that stack trace, it looks like it tries to grab a lock, but i am not sure
> why it would fail to grab the lock given that we only have one process uses
> the session.
>
> At the peak load, there are around ~200 docker containers running on a
> single amzn2 host. From the metric from database side, we do see some
> connections were dropped at the peak. I am not sure whether this would
> cause that threading.lock issue but they correlate. *Or is it possible
> due to the  event_handlers
> <https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>of
> the engine and the session query in the stack trace try to grab this lock
> at the same time*?
>
> Potential related issues that I have checked:
>
>    - Race condition in dialect initialization
>    <https://github.com/sqlalchemy/sqlalchemy/issues/6337>
>    - A possible race condition between dialect initialization and query
>    execution <https://github.com/sqlalchemy/sqlalchemy/issues/6337>
>
> but i could not figure out why it happens in our system. We also tried to
> reproduce in our stress test env, however, we could not reproduce it.
>
> Could you please share some insights? Any suggestions would be really
> appreciated.
>
> Thanks,
>
> Ping
>
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sqlalchemy/2979ec01-e3be-4c5d-b4aa-50a6f4fceee4n%40googlegroups.com
> <https://groups.google.com/d/msgid/sqlalchemy/2979ec01-e3be-4c5d-b4aa-50a6f4fceee4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sqlalchemy/14fccc71-d2c0-49cc-8e99-a22e0ce54d03%40app.fastmail.com
> <https://groups.google.com/d/msgid/sqlalchemy/14fccc71-d2c0-49cc-8e99-a22e0ce54d03%40app.fastmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CAO7yb8nvrFajA3JdXkqiW%2Bqfj%2BQbbxfUqeRUNB6EeQ%2BmNFsW6Q%40mail.gmail.com.

Re: [sqlalchemy] Seek for help - Python process stuck at _exec_once_mutex during first_connect in sqlalchemy/event/attr.py

Reply via email to