Re: [sqlalchemy] Patterns for multitenancy

Mike Bayer Tue, 20 Nov 2018 07:44:46 -0800

On Mon, Nov 19, 2018 at 5:27 PM Eric Smith <e...@esmithy.net> wrote:
>
> Thanks -- the "before_compile" event seems interesting -- that could solve 
> automatically adding a tenant on the SELECT side of things.


note that I've rewritten the PreFilteredQuery and GloblalFilter
recipes entirely to use before_compile:

https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/FilteredQuery

However, FilteredQuery doesn't handle all SQL, just ORM queries, and
even then it has some caveats.      There's ways to ensure a row-based
tenant is present in all kinds of queries but they're complicated and
there will always be more edge cases where something needs special
attention.     In SQLAlchemy, we have a feature called "single-table
inheritance" which has many similarities, in that a certain column is
a discriminator for rows, and this discriminator has to be present at
all times.    It's complicated and there have been many issues
(particularly joins) where it needed additional adjustment.


> What I wish for is a usage pattern like:
>
> When a web request comes in, determine the appropriate tenant
> Set the tenant somewhere associated with the context of the request
> Use the database as if it were single-tenant, relying on event 
> handlers/subclassed ORM objects/something to fill in the tenant for INSERT 
> and SELECT statements.

That's the pattern all of these techniques are trying to go for.

>
> Does that seem realistic?

For row-based tenancy you can do it but you would need to test every
new feature very carefully to make sure you aren't losing the tenant
for a certain row nor are you creating queries that are wrong.  Below
is a LEFT OUTER JOIN that is used to try to find rows in "a" that
don't have a row in "b".  This query should be rewritten to use a NOT
EXISTS subquery.  However, this kind of query is common:

SELECT a.id LEFT OUTER JOIN b ON a.id=b.a_id WHERE b.id IS NULL

if we add logic that naively adds tenant IDs to everything:

SELECT a.id LEFT OUTER JOIN b ON a.id=b.a_id WHERE b.id IS NULL WHERE
a.tenant_id=5 AND b.tenant_id=5

now the query won't work.  You'll get no rows for "a".    the better query is:

SELECT a.id FROM a WHERE NOT EXISTS(SELECT 1 FROM b WHERE a.id=b.a_id
WHERE b.tenant_id=5) AND a.tenant_id=5

Overall, the scheme that's presented in the schema_translate_map
feature is the one that definitely works in all cases.   But if you
have millions of "tenants", that's not really multitenancy IMHO,
that's user accounts.

>
> For schema-per-tenant, I can image how that would be done with 
> schema_translate_map or search_path, since that covers both INSERT and 
> SELECT. For row-per-tenant, is there something roughly equivalent to 
> before_compile to be able to modify INSERT statements to include a tenant? 
> There's the engine before_execute event -- maybe that would work -- but 
> someone in this group has said those are evil.

before_execute() and before_cursor_execute are your two events, and
since you'd want to intercept insert() structures before they are
compiled, you'd be using before_execute().   It's doable but you have
to test a lot.   But it's also not necessary if you're fine sticking
to ORM only.

>
> The "MultiAlchemy" example I linked to earlier creates a Session subclass 
> overriding the "add" method to automatically include the tenant, but it seems 
> like that would miss objects that get added implicitly through relationships.

If you want to do it at the ORM level only, for persistence there are
plenty of events you can use:

before_insert: 
https://docs.sqlalchemy.org/en/latest/orm/events.html?highlight=before_insert#sqlalchemy.orm.events.MapperEvents.before_insert
after_attach (or before_attach):
https://docs.sqlalchemy.org/en/latest/orm/events.html?highlight=after_attach#sqlalchemy.orm.events.SessionEvents.after_attach
init: 
https://docs.sqlalchemy.org/en/latest/orm/events.html?highlight=init#sqlalchemy.orm.events.InstanceEvents.init

Above, the "init" method is the earliest, and it is really just
another way to do the __init__() constructor on your object.   that's
the event that polymorphic persistence uses and is likely the simplest
to use, because you have your tenant id set up front.   But if the
tenant id is only associated with the Session then use
before_attach/after_attach.


>
> Finally, a specific question about before_compile. If the query I want to 
> modify is for first(), changing the query fails with:
>
> sqlalchemy.exc.InvalidRequestError: Query.filter() being called on a Query 
> which already has LIMIT or OFFSET applied. To modify the row-limited results 
> of a  Query, call from_self() first.  Otherwise, call filter() before limit() 
> or offset() are applied.

right you'll see in the wiki examples you have to call a special
method query.enable_assertions(False) which is explicitly for writing
these recipes.  don't use from_self(), that gives you a much more
complicated query and is not appropriate here.



>
> If I use from_self(), I get:
>
> RecursionError: maximum recursion depth exceeded while calling a Python object
>
>
> I guess because from_self causes the event handler to be fired again. I could 
> use some kind of flag to avoid that, but needing to do so makes me wonder if 
> I'm doing something wrong? Example:
>
>
> from sqlalchemy import create_engine, Column, Integer, String, event, Boolean
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy.orm import sessionmaker, Query
>
> engine = create_engine('sqlite:///:memory:', echo=True)
> Base = declarative_base()
> Session = sessionmaker(bind=engine)
>
>
> class User(Base):
>     __tablename__ = 'users'
>
>     id = Column(Integer, primary_key=True)
>     name = Column(String)
>     fullname = Column(String)
>     password = Column(String)
>     deleted = Column(Boolean, default=False)
>
>
> @event.listens_for(Query, "before_compile", retval=True)
> def no_deleted(query):
>     for desc in query.column_descriptions:
>         if desc['type'] is User:
>             entity = desc['entity']
>             query = query.from_self().filter(entity.deleted == False)
>     return query
>
>
> Base.metadata.create_all(engine)
> session = Session()
> ed_user = User(name='ed', fullname='Ed Jones', password='edspassword')
> session.add(ed_user)
> session.commit()
>
> any_not_deleted_user = session.query(User).first()
>
>
> Thanks again for the guidance,
>
> Eric
>
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Patterns for multitenancy

Reply via email to