Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

Антонио Антуан Thu, 23 Nov 2017 08:21:12 -0800

See, that `Query.get` using `get_from_identity` instead of `instances` with
key, constructed into `_get_impl`.
Could you point to another places where same problem (described into my
first message) can be appeared?


чт, 23 нояб. 2017 г. в 16:44, Антонио Антуан <a.ch....@gmail.com>:

>
> > A Query can have lots of entities in it, and if you're doing sharding a
> > single result set can refer to any number of shard identifiers within
> > not just a single result set but within a single row; they might have
> > come from dozens of different databases at once
>
> In my case it is not possible: all entities in query can be gotten only
> from one particular shard. We have totally the same database structure for
> each shard. The difference is just data stored into database. No `shard_id`
> or any other key as part of primary key for any table. If I want to make
> query for particular database I always want to retrieve data ONLY from that
> database. And even more than that: ONLY one database during one session
> transaction (or, in other words, one http-request to our app).
>
>
> > This could only be suited with a very open plugin point that is carefully
> > architected, tested, and documented and I don't have the resources to
> > envision this for a short-term use case.
>
> I've seen a lot of questions (for example in stackoverflow) "how to manage
> several binds with sqlalchemy" and all answers are "Use separated
> sessions". I really not understand, what is the problem to implement
> "multibound" session. I make it in my project and it's really beautiful,
> clear and... Don't have ideal vocabulary to explain how our team like it :)
> Currently I see only one problem: loading instances. Of course, after
> fixing other problems may appear...
>
> I can (and want) make it as part of SQLAlchemy library. Fully-tested part,
> of course. If you say that it is bad idea, ok then. I can make it as a
> plugin, but there is a problem: functions in `loading` module are
> monolithic and it needs some refactor for the plugin. May I suggest
> refactor as pull request? And if so: could it be merged not only for major
> release but for, at least, 1.0.* (yes, in our project we use 1.0.19 :) )?
> I really don't want (of course!) to copy entire `loading` module for
> additional logic of `identitykey` construction. But currently I do not see
> any other way to implement it for my project :(
>
> > when you query two different databases, you are using
> > two independent transactions in any case;
>
> So, what is the difference, if there are two transactions in any case? :)
>
> > I don't understand why you can't use independent sessions
>
> The first problem is `query` property of `Base` instances. If we use
> several sessions, we need to use the same amount of `Base` classes and,
> consequently, the same amount of models, don't we?
> Another problem for us is already existed code. We can use sessions
> registry, but it take a lot of month to override entire project. Another
> way:  append into `Query.__iter__` such a code:
> >> self.with_session(sessions.get(self._shard_id))
> >> return super(Query, self).__iter__()
> But it has no effect for UPDATE and INSERT queries. Also, I'm not sure
> that there is no problems in that way...
>
>
> I have one more thought. Don't you think that it is some kind of bug: I
> make query for one bind and got entity from another. Yes, that behavior is
> not foreseen by library. But from other point of view, library docs have
> examples how to use several binds within one session. So, problem may
> happens not only in my case.
>
> Anyway, can my suggestion (
> https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced) has any
> effect for classic one-bound `Session`? If it can't, so, what's the
> problem? :)
>
>
> Excuse me for wasting your time.
> And excuse me if my suggestions are idiotic :)
>
> Appreciate your help.
>
> чт, 23 нояб. 2017 г. в 0:20, Mike Bayer <mike...@zzzcomputing.com>:
>
>> On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан <a.ch....@gmail.com>
>> wrote:
>> > Glad to see that you remember my messages :)
>> >
>> > I've dived into `loading` module and I see that currently it is really
>> > complicated to store additional data into pkey for each instance.
>> >
>> > Unfortunately, suggested solutions not good for my project.
>> > Also, I think that `shard` meaning in my case is not the same as usual.
>> >
>> > I want to describe structure of out project, maybe it can help.
>> >
>> > Here is definition of our databases structure:
>> > http://joxi.ru/nAyJVvGiXMv0Dr.
>> > We got master db and several geo databases. Catalogs like `users`,
>> `groups`,
>> > `offers` and other are replicating to geo databases, so that data is
>> always
>> > the same.
>> > But also we have tables like `clicks` and `leads`. Each app instance
>> > contains the data about them in database, related to its geo:
>> > europe-instance into europe-db, usa-instance into usa-database and so
>> on.
>> > Periodically master-app pulls clicks and leads to master-database.
>> Synced
>> > objects always have different ids into master- and get-db, so it is ok.
>> >
>> > But one time project owner came and said: "I need SAAS".
>> > We see, that in current structure it's very hard (and really ugly) to
>> > implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and
>> other
>> > models will be multiplied with tenants amount.
>> >
>> > I discovered that I can override `get_bind` with another logic and it
>> was
>> > great: we can remove several `Base` classes, several Sessions and
>> several
>> > `Orders`.
>> >
>> > Mechanism looks like this:
>> > - we use one instance on each geo for all tenants.
>> > - we create separated databases for each tenant: this will be multiplied
>> > with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr.
>> > - we detect `tenant_id` using `request.host` (we use flask): each domain
>> > binds with particular tenant;
>> > - we store `tenant_id` into global storage.
>> > - we use stored `tenant_id` into `Session.get_bind`:
>> > https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a
>> > - into flask `teardown_request` we clear `tenant_id` storage and call
>> > `Session.remove()`
>> > - if we need to read from another get, just write `query =
>> > query.set_shard_id(GEO)` `tenant_id`
>> >
>> > For celery we use this:
>> > https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50
>> > Clear to.
>> >
>> > All this leads us only to pros, without cons: any developer has no need
>> to
>> > think about database chosing, just write code like there is only one
>> > database. If you need to read from another geo-database, just call
>> > `query.set_shard(GEO)`, tenant will be appended automatically to it.
>> >
>> > Problems begin when we tried to test non-flask and non-celery scripts,
>> like
>> > cron tasks: we may want to query several tenant-databases during one
>> > SQLA-transaction, somethins like in my first example:
>> > https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f
>> > (`assert_got_correct_objects_with_remove` and
>> > `assert_got_cached_objects_without_remove`). The result you know.
>> >
>> >
>> > During writing this message, I found out, that we need only one
>> additional
>> > data for primary key: `connection.bind.url`. I see, that SQLA already
>> have
>> > it inside `_instance_processor`, so it exists inside `_instance`. I
>> think,
>> > that `identity_key` should be constructed (in my case) with this code:
>> > https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear,
>> don't
>> > you think so?
>>
>> that's where something needs to happen but SQLAlchemy can't do this in
>> such a way that is hardcoded to exactly your particular use case.   A
>> Query can have lots of entities in it, and if you're doing sharding a
>> single result set can refer to any number of shard identifiers within
>> not just a single result set but within a single row; they might have
>> come from dozens of different databases at once.  This could only be
>> suited with a very open plugin point that is carefully architected,
>> tested, and documented and I don't have the resources to envision this
>> for a short-term use case.
>>
>> > Problems begin when we tried to test non-flask and non-celery scripts,
>> like
>> > cron tasks: we may want to query several tenant-databases during one
>> > SQLA-transaction,
>>
>> I don't understand why you can't use independent sessions for this,
>> because when you query two different databases, you are using two
>> independent transactions in any case;  they are only coordinated if
>> one is using two-phase transactions which I doubt is the case here
>> (while SQLAlchemy put a lot of work into making that possible, I don't
>> think anyone has ever used that feature).
>>
>> Instead of:
>>
>> Session.query(Foo).set_shard(id)
>>
>> you say:
>>
>> sessions.get(shard_id).query(Foo)
>>
>>
>>
>>
>>
>> >
>> >
>> >
>> > вт, 21 нояб. 2017 г. в 19:15, Mike Bayer <mike...@zzzcomputing.com>:
>> >>
>> >> I've looked to see how hard it would be to allow "supplemental"
>> >> attributes to form part of the mapper's primary key tuple, and it
>> >> would be pretty hard.   The "easy" part is getting the mapper to set
>> >> itself up with some extra attributes that can deliver some kind of
>> >> supplemental value to the identity key.  the harder part is then in
>> >> loading.py where we get new rows from the DB and need this
>> >> value...which means some whole new kind of system would need to
>> >> deliver this for any arbitrary part of the result set given a mapping
>> >> and the selectable we're looking at (keep in mind a Query can have
>> >> lots of the same mapping in a single row with aliases).   This would
>> >> be very complicated to implement and test.   I am not seeing any quick
>> >> way to suit this use case, which has also not ever been requested
>> >> before.
>> >>
>> >>
>> >>
>> >> On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer <mike...@zzzcomputing.com
>> >
>> >> wrote:
>> >> > On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан <a.ch....@gmail.com>
>> >> > wrote:
>> >> >> Hi guys.
>> >> >>
>> >> >> I got this code example:
>> >> >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f.
>> >> >>
>> >> >> Here I make custom `Session` and custom `Query`. As you see,
>> `Session`
>> >> >> has
>> >> >> several binds.
>> >> >>
>> >> >> Also, you can see that there are two functions:
>> >> >> `assert_got_correct_objects_with_remove` and
>> >> >> `assert_got_cached_objects_without_remove`.
>> >> >>
>> >> >> The first checks that we got correct results if `Session.remove`
>> >> >> called.
>> >> >> The second checks, that we got incorrect results if `Session.remove`
>> >> >> not
>> >> >> called.
>> >> >>
>> >> >> I understand, that behavior is correct: we don't remove session -
>> so,
>> >> >> we got
>> >> >> same result from "cache-like"
>> >> >> `sqlalchemy.orm.loading._instance_processor.session_identity_map`.
>> >> >>
>> >> >> I want to avoid that mechanism and don't want to use
>> >> >> `session_identity_map`
>> >> >> for different binds. In ideal, bind should be used as part of key
>> for
>> >> >> `session_identity_map`, but I guess, that it is not possible.
>> >> >> Another way, acceptable for me: disable this mechanism. But I do not
>> >> >> found
>> >> >> ways to achieve this.
>> >> >> And the third option: construct instances manually. Looks like I
>> should
>> >> >> copy
>> >> >> code from `loading` module and add that method to `CustomSession`:
>> >> >
>> >> >
>> >> > there's really no reason at all to use a "ShardedSession" if you have
>> >> > overlapping primary key spaces from each of your binds.   I'm not
>> sure
>> >> > if I mentioned this at the beginning of the emails regarding this
>> >> > project but I hope that I mentioned just using separate Session
>> >> > objects is vastly simpler for non-intricate sharding cases, such as
>> >> > where you always know which shard you care about and you don't care
>> >> > about any of the others for a certain operation.     The point of
>> >> > ShardedSession is so that objects pulled from multiple databases can
>> >> > be intermingled in the same query and in the same transaction - which
>> >> > by definition means they have unique primary keys.   If that's not
>> >> > what you're doing here I don't see what advantage all this complexity
>> >> > is getting you.
>> >> >
>> >> > If you're still convinced you need to be using a monolithic
>> >> > ShardedSession then there needs to be some kind of translation of
>> data
>> >> > such that the mapper sees unique primary keys across the shards, or
>> >> > unique classes.
>> >> >
>> >> > I've tried to think of ways to do this without too much difficulty
>> but
>> >> > none of them are really worth the complexity and hackiness it would
>> >> > require.   The absolutely quickest and most well-supported, no hacks
>> >> > required way would be to properly create a composite primary key on
>> >> > your classes, where the second column is your shard id:
>> >> >
>> >> > class A(Base):
>> >> >     __tablename__ = 'a'
>> >> >
>> >> >     id = Column(Integer, primary_key=True)
>> >> >     shard_id = Column(Integer, primary_key=True)
>> >> >
>> >> > I tried to see if the "shard_id" column can be some kind of
>> expression
>> >> > that is not a Column on the Table but the mapper() is not set up to
>> >> > support this unless you mapped the whole class to a select()
>> >> > construct, which would make for too-complicated SQL, and you'd still
>> >> > need to intercept this select() using events to put the right shard
>> id
>> >> > in.  Another is to create a custom column that renders in a special
>> >> > way, but again you need to create events to intercept it in every
>> case
>> >> > to put the right shard id in, and/or remove it from things like
>> >> > insert() statements.
>> >> >
>> >> > by far your two best solutions are: 1. use separate Session objects
>> >> > per shard  2. make sure your data actually has shard-specific primary
>> >> > keys
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >> def instances(self, cursor, __context=None):
>> >> >>     context = __context
>> >> >>     if context is None:
>> >> >>         context = QueryContext(self)
>> >> >>     return self._custom_instances(self, cursor, context)
>> >> >>
>> >> >>
>> >> >>
>> >> >> def custom_instances(query, cursor, context):
>> >> >>      """copied from `loading.instances` code with disabled
>> >> >> `session_identity_map`"""
>> >> >>
>> >> >>
>> >> >>
>> >> >> The third way is the most ugly and I want to avoid it.
>> >> >>
>> >> >> Could you help me with my hard choice and, maybe, suggest any other
>> >> >> ways and
>> >> >> options? :)
>> >> >>
>> >> >> Thank you.
>> >> >>
>> >> >> --
>> >> >> SQLAlchemy -
>> >> >> The Python SQL Toolkit and Object Relational Mapper
>> >> >>
>> >> >> http://www.sqlalchemy.org/
>> >> >>
>> >> >> To post example code, please provide an MCVE: Minimal, Complete, and
>> >> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a
>> full
>> >> >> description.
>> >> >> ---
>> >> >> You received this message because you are subscribed to the Google
>> >> >> Groups
>> >> >> "sqlalchemy" group.
>> >> >> To unsubscribe from this group and stop receiving emails from it,
>> send
>> >> >> an
>> >> >> email to sqlalchemy+unsubscr...@googlegroups.com.
>> >> >> To post to this group, send email to sqlalchemy@googlegroups.com.
>> >> >> Visit this group at https://groups.google.com/group/sqlalchemy.
>> >> >> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> SQLAlchemy -
>> >> The Python SQL Toolkit and Object Relational Mapper
>> >>
>> >> http://www.sqlalchemy.org/
>> >>
>> >> To post example code, please provide an MCVE: Minimal, Complete, and
>> >> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a
>> full
>> >> description.
>> >> ---
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "sqlalchemy" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to sqlalchemy+unsubscr...@googlegroups.com.
>> >> To post to this group, send email to sqlalchemy@googlegroups.com.
>> >> Visit this group at https://groups.google.com/group/sqlalchemy.
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> > --
>> >
>> > Антон
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to sqlalchemy+unsubscr...@googlegroups.com.
>> > To post to this group, send email to sqlalchemy@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sqlalchemy+unsubscr...@googlegroups.com.
>> To post to this group, send email to sqlalchemy@googlegroups.com.
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
>
> Антон
>
-- 

Антон

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

Reply via email to