I've added an issue to track this feature targeted for the moment at 1.3 (note that's for many months from now), however 1.3 has a lot in it right now:
https://bitbucket.org/zzzeek/sqlalchemy/issues/4137/sharding-token-extension-in-identity-key On Wed, Nov 22, 2017 at 4:20 PM, Mike Bayer <mike...@zzzcomputing.com> wrote: > On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан <a.ch....@gmail.com> wrote: >> Glad to see that you remember my messages :) >> >> I've dived into `loading` module and I see that currently it is really >> complicated to store additional data into pkey for each instance. >> >> Unfortunately, suggested solutions not good for my project. >> Also, I think that `shard` meaning in my case is not the same as usual. >> >> I want to describe structure of out project, maybe it can help. >> >> Here is definition of our databases structure: >> http://joxi.ru/nAyJVvGiXMv0Dr. >> We got master db and several geo databases. Catalogs like `users`, `groups`, >> `offers` and other are replicating to geo databases, so that data is always >> the same. >> But also we have tables like `clicks` and `leads`. Each app instance >> contains the data about them in database, related to its geo: >> europe-instance into europe-db, usa-instance into usa-database and so on. >> Periodically master-app pulls clicks and leads to master-database. Synced >> objects always have different ids into master- and get-db, so it is ok. >> >> But one time project owner came and said: "I need SAAS". >> We see, that in current structure it's very hard (and really ugly) to >> implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and other >> models will be multiplied with tenants amount. >> >> I discovered that I can override `get_bind` with another logic and it was >> great: we can remove several `Base` classes, several Sessions and several >> `Orders`. >> >> Mechanism looks like this: >> - we use one instance on each geo for all tenants. >> - we create separated databases for each tenant: this will be multiplied >> with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr. >> - we detect `tenant_id` using `request.host` (we use flask): each domain >> binds with particular tenant; >> - we store `tenant_id` into global storage. >> - we use stored `tenant_id` into `Session.get_bind`: >> https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a >> - into flask `teardown_request` we clear `tenant_id` storage and call >> `Session.remove()` >> - if we need to read from another get, just write `query = >> query.set_shard_id(GEO)` `tenant_id` >> >> For celery we use this: >> https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50 >> Clear to. >> >> All this leads us only to pros, without cons: any developer has no need to >> think about database chosing, just write code like there is only one >> database. If you need to read from another geo-database, just call >> `query.set_shard(GEO)`, tenant will be appended automatically to it. >> >> Problems begin when we tried to test non-flask and non-celery scripts, like >> cron tasks: we may want to query several tenant-databases during one >> SQLA-transaction, somethins like in my first example: >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f >> (`assert_got_correct_objects_with_remove` and >> `assert_got_cached_objects_without_remove`). The result you know. >> >> >> During writing this message, I found out, that we need only one additional >> data for primary key: `connection.bind.url`. I see, that SQLA already have >> it inside `_instance_processor`, so it exists inside `_instance`. I think, >> that `identity_key` should be constructed (in my case) with this code: >> https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, don't >> you think so? > > that's where something needs to happen but SQLAlchemy can't do this in > such a way that is hardcoded to exactly your particular use case. A > Query can have lots of entities in it, and if you're doing sharding a > single result set can refer to any number of shard identifiers within > not just a single result set but within a single row; they might have > come from dozens of different databases at once. This could only be > suited with a very open plugin point that is carefully architected, > tested, and documented and I don't have the resources to envision this > for a short-term use case. > >> Problems begin when we tried to test non-flask and non-celery scripts, like >> cron tasks: we may want to query several tenant-databases during one >> SQLA-transaction, > > I don't understand why you can't use independent sessions for this, > because when you query two different databases, you are using two > independent transactions in any case; they are only coordinated if > one is using two-phase transactions which I doubt is the case here > (while SQLAlchemy put a lot of work into making that possible, I don't > think anyone has ever used that feature). > > Instead of: > > Session.query(Foo).set_shard(id) > > you say: > > sessions.get(shard_id).query(Foo) > > > > > >> >> >> >> вт, 21 нояб. 2017 г. в 19:15, Mike Bayer <mike...@zzzcomputing.com>: >>> >>> I've looked to see how hard it would be to allow "supplemental" >>> attributes to form part of the mapper's primary key tuple, and it >>> would be pretty hard. The "easy" part is getting the mapper to set >>> itself up with some extra attributes that can deliver some kind of >>> supplemental value to the identity key. the harder part is then in >>> loading.py where we get new rows from the DB and need this >>> value...which means some whole new kind of system would need to >>> deliver this for any arbitrary part of the result set given a mapping >>> and the selectable we're looking at (keep in mind a Query can have >>> lots of the same mapping in a single row with aliases). This would >>> be very complicated to implement and test. I am not seeing any quick >>> way to suit this use case, which has also not ever been requested >>> before. >>> >>> >>> >>> On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer <mike...@zzzcomputing.com> >>> wrote: >>> > On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан <a.ch....@gmail.com> >>> > wrote: >>> >> Hi guys. >>> >> >>> >> I got this code example: >>> >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f. >>> >> >>> >> Here I make custom `Session` and custom `Query`. As you see, `Session` >>> >> has >>> >> several binds. >>> >> >>> >> Also, you can see that there are two functions: >>> >> `assert_got_correct_objects_with_remove` and >>> >> `assert_got_cached_objects_without_remove`. >>> >> >>> >> The first checks that we got correct results if `Session.remove` >>> >> called. >>> >> The second checks, that we got incorrect results if `Session.remove` >>> >> not >>> >> called. >>> >> >>> >> I understand, that behavior is correct: we don't remove session - so, >>> >> we got >>> >> same result from "cache-like" >>> >> `sqlalchemy.orm.loading._instance_processor.session_identity_map`. >>> >> >>> >> I want to avoid that mechanism and don't want to use >>> >> `session_identity_map` >>> >> for different binds. In ideal, bind should be used as part of key for >>> >> `session_identity_map`, but I guess, that it is not possible. >>> >> Another way, acceptable for me: disable this mechanism. But I do not >>> >> found >>> >> ways to achieve this. >>> >> And the third option: construct instances manually. Looks like I should >>> >> copy >>> >> code from `loading` module and add that method to `CustomSession`: >>> > >>> > >>> > there's really no reason at all to use a "ShardedSession" if you have >>> > overlapping primary key spaces from each of your binds. I'm not sure >>> > if I mentioned this at the beginning of the emails regarding this >>> > project but I hope that I mentioned just using separate Session >>> > objects is vastly simpler for non-intricate sharding cases, such as >>> > where you always know which shard you care about and you don't care >>> > about any of the others for a certain operation. The point of >>> > ShardedSession is so that objects pulled from multiple databases can >>> > be intermingled in the same query and in the same transaction - which >>> > by definition means they have unique primary keys. If that's not >>> > what you're doing here I don't see what advantage all this complexity >>> > is getting you. >>> > >>> > If you're still convinced you need to be using a monolithic >>> > ShardedSession then there needs to be some kind of translation of data >>> > such that the mapper sees unique primary keys across the shards, or >>> > unique classes. >>> > >>> > I've tried to think of ways to do this without too much difficulty but >>> > none of them are really worth the complexity and hackiness it would >>> > require. The absolutely quickest and most well-supported, no hacks >>> > required way would be to properly create a composite primary key on >>> > your classes, where the second column is your shard id: >>> > >>> > class A(Base): >>> > __tablename__ = 'a' >>> > >>> > id = Column(Integer, primary_key=True) >>> > shard_id = Column(Integer, primary_key=True) >>> > >>> > I tried to see if the "shard_id" column can be some kind of expression >>> > that is not a Column on the Table but the mapper() is not set up to >>> > support this unless you mapped the whole class to a select() >>> > construct, which would make for too-complicated SQL, and you'd still >>> > need to intercept this select() using events to put the right shard id >>> > in. Another is to create a custom column that renders in a special >>> > way, but again you need to create events to intercept it in every case >>> > to put the right shard id in, and/or remove it from things like >>> > insert() statements. >>> > >>> > by far your two best solutions are: 1. use separate Session objects >>> > per shard 2. make sure your data actually has shard-specific primary >>> > keys >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >> >>> >> def instances(self, cursor, __context=None): >>> >> context = __context >>> >> if context is None: >>> >> context = QueryContext(self) >>> >> return self._custom_instances(self, cursor, context) >>> >> >>> >> >>> >> >>> >> def custom_instances(query, cursor, context): >>> >> """copied from `loading.instances` code with disabled >>> >> `session_identity_map`""" >>> >> >>> >> >>> >> >>> >> The third way is the most ugly and I want to avoid it. >>> >> >>> >> Could you help me with my hard choice and, maybe, suggest any other >>> >> ways and >>> >> options? :) >>> >> >>> >> Thank you. >>> >> >>> >> -- >>> >> SQLAlchemy - >>> >> The Python SQL Toolkit and Object Relational Mapper >>> >> >>> >> http://www.sqlalchemy.org/ >>> >> >>> >> To post example code, please provide an MCVE: Minimal, Complete, and >>> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>> >> description. >>> >> --- >>> >> You received this message because you are subscribed to the Google >>> >> Groups >>> >> "sqlalchemy" group. >>> >> To unsubscribe from this group and stop receiving emails from it, send >>> >> an >>> >> email to sqlalchemy+unsubscr...@googlegroups.com. >>> >> To post to this group, send email to sqlalchemy@googlegroups.com. >>> >> Visit this group at https://groups.google.com/group/sqlalchemy. >>> >> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> SQLAlchemy - >>> The Python SQL Toolkit and Object Relational Mapper >>> >>> http://www.sqlalchemy.org/ >>> >>> To post example code, please provide an MCVE: Minimal, Complete, and >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>> description. >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "sqlalchemy" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to sqlalchemy+unsubscr...@googlegroups.com. >>> To post to this group, send email to sqlalchemy@googlegroups.com. >>> Visit this group at https://groups.google.com/group/sqlalchemy. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> >> Антон >> >> -- >> SQLAlchemy - >> The Python SQL Toolkit and Object Relational Mapper >> >> http://www.sqlalchemy.org/ >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> description. >> --- >> You received this message because you are subscribed to the Google Groups >> "sqlalchemy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to sqlalchemy+unsubscr...@googlegroups.com. >> To post to this group, send email to sqlalchemy@googlegroups.com. >> Visit this group at https://groups.google.com/group/sqlalchemy. >> For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.