Re: [sqlalchemy] How to validate consistent state across multiple models in a transaction?

Mike Bayer Sat, 27 Oct 2018 07:54:46 -0700

On Fri, Oct 26, 2018 at 7:49 PM <phil.nac...@freenome.com> wrote:
>
> I see, thanks for the clarification about noload.
>
> Setting aside my specific issue, if using the session in 
> before_insert/before_update events is not a good idea, what is generally the 
> best way to perform validation across multiple models? Is it to just read the 
> relationships from the model that's being validated, relying on the automatic 
> loading of those relationships?


well  "validation" can mean a lot of things, usually "validation" is
in reference to user input, so it typically occurs early before any
database logic happens, that is, if someone says "foo.bar = bat", you
would intercept that event and validate if you really want "bat" to be
set up.

The next level that "validation" is common is right before a bunch of
data is written to the database.   SQLAlchemy's before_flush() hook is
very useful for that.  In this hook, people can iterate through all
the objects that are about to have some state pushed through to the
database and check that things are set up as expected.    It is not
common that one would want to query the database within this hook,
mostly because all the things we are validating have not been pushed
yet.   This has relevance to your example where you are querying the
database for state within "before_update" and "before_insert" which I
don't quite understand since that is explicitly before changes have
been pushed.     For your specific example, I think turning on
echo=True and watching how SQL is emitted in relationship to your
validation occurring will illustrate what is actually happening.

Still another kind of "validation" is kind of what it seems you want
to do, which is that after everything is pushed to the database but
before the transaction is committed, you poke around in the model and
make sure things are what they are supposed to be.   Hooks where this
kind of thing can happen are in after_flush, after_flush_postexec,
before_commit.    Particularly in the before_commit hook it is OK to
emit SELECT statements to get back state from the database as well,
though this is not a common practice because folks don't want to add
SELECT statements to their flushes.

Now you example shows that you are bypassing the flush mechanics
entirely in favor of using the bulk_save_objects method.  The reason
this method exists is strictly when an application wants to push rows
into the database with as little per-row overhead as possible, so in
this case, most of the event hooks are bypassed, and I in fact had to
check the code to see that the before_insert and before_update hooks
are even used in here.   Suffice to say that it doesn't really make
sense to use the bulk_* methods but then emit a SELECT for every row,
it's kind of like if an airplane had to land for traffic lights every
three blocks.    To use validation of some kind with the bulk_*
methods, the only appropriate technique is that the data being sent
into this methods is ensured to be fully correct before being passed
in.

My advice would be to build everything out using the most idiomatic
patterns possible, which means using the regular unit of work pattern,
using relationshisp normally (eg. dont set object.foo_id = 7, just set
object.foo = somefoo(7)), make everything as correct and minimal as
possible - *then* observe if performance is suffering.    Then to
evaluate performance, I strongly recommend working through each of the
techniques at 
https://docs.sqlalchemy.org/en/latest/faq/performance.html#how-can-i-profile-a-sqlalchemy-powered-application.
  You can then make modifications such as moving some relationship
loads to be foreign-key-attribute only, things like that.





>
> On Friday, October 26, 2018 at 3:20:01 PM UTC-7, Mike Bayer wrote:
>>
>> also, your test seems to be making use of the "before_insert" and
>> "before_update" events, inside which it's generally not a good idea to
>> use the Session since you're in the middle of a flush, and also that
>> event hook is before the SQL is emitted in any case, and addtionally
>> the state of objects in memory has not been expired yet, so you're
>> going to see all stale data.    you'd want to turn on echo=True at
>> least and see what's happening when.
>> On Fri, Oct 26, 2018 at 2:20 PM <phil....@freenome.com> wrote:
>> >
>> > I have the following three models (with other columns omitted):
>> >
>> >
>> > class ContainerSlot(db.Model):
>> >     __tablename__ = 'container_slots'
>> >
>> >     id = db.Column(db.Integer, primary_key=True)
>> >     tube_id = db.Column(db.Integer, db.ForeignKey('tubes.id'), 
>> > unique=True, index=True)
>> >     container_id = db.Column(db.Integer, db.ForeignKey('containers.id'), 
>> > nullable=False, index=True)
>> >
>> >     tube = db.relationship('Tube', uselist=False, 
>> > back_populates='container_slot', lazy='noload')
>> >     container = db.relationship('Container', uselist=False, 
>> > innerjoin=True, back_populates='container_slots', lazy='joined')
>> >
>> >
>> >
>> > class Container(db.Model):
>> >     __tablename__ = 'containers'
>> >
>> >     id = db.Column(db.Integer, primary_key=True)
>> >     flag = db.Column(db.Boolean)
>> >     container_slots = db.relationship('ContainerSlot', lazy='noload', 
>> > back_populates='container', order_by='ContainerSlot.id')
>> >
>> >
>> > class Tube(db.Model):
>> >     __tablename__ = 'tubes'
>> >
>> >     id = db.Column(db.Integer, primary_key=True)
>> >     flag = db.Column(db.Boolean)
>> >     container_slot = db.relationship('ContainerSlot', uselist=False, 
>> > back_populates='tube', lazy='joined')
>> >
>> >
>> > A container has many container slots, and a container slot has a tube. A 
>> > container slot can be "empty" (it doesn't have a tube), so its tube_id is 
>> > nullable.
>> >
>> > I'm trying to validate that the flag is consistent between a container and 
>> > its tubes. That is, if a tube is in a container (i.e there is a container 
>> > slot with its ID as tube_id), then the tube's flag must match the value of 
>> > its container's flag.
>> >
>> > Ideally, the only way to update these fields without failing the 
>> > validation is through endpoints which perform updates across tables in a 
>> > single transaction, ensuring that the state is consistent.
>> >
>> > For example, I have an endpoint which sets the flag to false for a list of 
>> > tube IDs, and removes them from their container
>> >
>> > tube_ids = args['tube_ids']
>> >
>> > load_options = [db.noload('*'), 
>> > db.joinedload(Tube.container_slot).noload('*')]
>> > tubes = 
>> > Tube.query.options(load_options).filter(Tube.id.in_(tube_ids)).all()
>> > container_slots = [t.container_slot for t in tubes if t.container_slot]
>> > for tube in tubes:
>> >     tube.flag = False
>> >
>> > for cs in container_slots:
>> >     cs.tube_id = None
>> >
>> > db.session.bulk_save_objects(tubes)
>> > db.session.bulk_save_objects(container_slots)
>> > db.session.commit()
>> >
>> > Here is a validation which is failing
>> >
>> > @event.listens_for(Tube, 'before_insert')
>> > @event.listens_for(Tube, 'before_update')
>> > def validate_tube(mapper, connection, tube):
>> >     if tube.flag is None:
>> >         return
>> >
>> >     row = (db.session.query(Container.flag, Container.name)
>> >                      .join(Container.container_slots)
>> >                      .join(ContainerSlot.tube)
>> >                      .filter(Tube.id == tube.id)
>> >                      .one_or_none())
>> >
>> >     (container_flag, name) = row if row else (None, None)
>> >
>> >     if tube.flag is True and container_flag is False:
>> >         raise BadRequest('...')
>> >
>> >     if tube.flag is False and container_flag is True:
>> >         raise BadRequest('...')
>> >
>> > I would expect that row is None here, because the tube was removed from 
>> > its container slot, so the tube doesn't have a container
>> > However, row returns the container that the tube was previously in, and 
>> > since the container's flag is true, the validation fails.
>> >
>> > Some things I've already tried which didn't work are:
>> >
>> > Setting the container slot's tube field to None, instead of setting 
>> > tube_id to None
>> > Setting the tube's container_slot field to None, instead of setting the 
>> > container slot's tube or tube_id to None
>> > Using db.session.add instead of db.session.bulk_save_objects
>> >
>> >
>> > I'm using SQLAlchemy 1.1.18 and Flask-SQLAlchemy 2.0.6
>> >
>> > Thank you!
>> >
>> > This e-mail is private and confidential and is for the addressee only. If 
>> > misdirected, please notify us by telephone, confirming that it has been 
>> > deleted from your system and any hard copies destroyed. You are strictly 
>> > prohibited from using, printing, distributing or disseminating it or any 
>> > information contained in it save to the intended recipient.
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and 
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google Groups 
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to sqlalchemy+...@googlegroups.com.
>> > To post to this group, send email to sqlal...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>
>
> This e-mail is private and confidential and is for the addressee only. If 
> misdirected, please notify us by telephone, confirming that it has been 
> deleted from your system and any hard copies destroyed. You are strictly 
> prohibited from using, printing, distributing or disseminating it or any 
> information contained in it save to the intended recipient.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] How to validate consistent state across multiple models in a transaction?

Reply via email to