See, that `Query.get` using `get_from_identity` instead of `instances` with key, constructed into `_get_impl`. Could you point to another places where same problem (described into my first message) can be appeared?
чт, 23 нояб. 2017 г. в 16:44, Антонио Антуан <a.ch....@gmail.com>: > > > A Query can have lots of entities in it, and if you're doing sharding a > > single result set can refer to any number of shard identifiers within > > not just a single result set but within a single row; they might have > > come from dozens of different databases at once > > In my case it is not possible: all entities in query can be gotten only > from one particular shard. We have totally the same database structure for > each shard. The difference is just data stored into database. No `shard_id` > or any other key as part of primary key for any table. If I want to make > query for particular database I always want to retrieve data ONLY from that > database. And even more than that: ONLY one database during one session > transaction (or, in other words, one http-request to our app). > > > > This could only be suited with a very open plugin point that is carefully > > architected, tested, and documented and I don't have the resources to > > envision this for a short-term use case. > > I've seen a lot of questions (for example in stackoverflow) "how to manage > several binds with sqlalchemy" and all answers are "Use separated > sessions". I really not understand, what is the problem to implement > "multibound" session. I make it in my project and it's really beautiful, > clear and... Don't have ideal vocabulary to explain how our team like it :) > Currently I see only one problem: loading instances. Of course, after > fixing other problems may appear... > > I can (and want) make it as part of SQLAlchemy library. Fully-tested part, > of course. If you say that it is bad idea, ok then. I can make it as a > plugin, but there is a problem: functions in `loading` module are > monolithic and it needs some refactor for the plugin. May I suggest > refactor as pull request? And if so: could it be merged not only for major > release but for, at least, 1.0.* (yes, in our project we use 1.0.19 :) )? > I really don't want (of course!) to copy entire `loading` module for > additional logic of `identitykey` construction. But currently I do not see > any other way to implement it for my project :( > > > when you query two different databases, you are using > > two independent transactions in any case; > > So, what is the difference, if there are two transactions in any case? :) > > > I don't understand why you can't use independent sessions > > The first problem is `query` property of `Base` instances. If we use > several sessions, we need to use the same amount of `Base` classes and, > consequently, the same amount of models, don't we? > Another problem for us is already existed code. We can use sessions > registry, but it take a lot of month to override entire project. Another > way: append into `Query.__iter__` such a code: > >> self.with_session(sessions.get(self._shard_id)) > >> return super(Query, self).__iter__() > But it has no effect for UPDATE and INSERT queries. Also, I'm not sure > that there is no problems in that way... > > > I have one more thought. Don't you think that it is some kind of bug: I > make query for one bind and got entity from another. Yes, that behavior is > not foreseen by library. But from other point of view, library docs have > examples how to use several binds within one session. So, problem may > happens not only in my case. > > Anyway, can my suggestion ( > https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced) has any > effect for classic one-bound `Session`? If it can't, so, what's the > problem? :) > > > Excuse me for wasting your time. > And excuse me if my suggestions are idiotic :) > > Appreciate your help. > > чт, 23 нояб. 2017 г. в 0:20, Mike Bayer <mike...@zzzcomputing.com>: > >> On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан <a.ch....@gmail.com> >> wrote: >> > Glad to see that you remember my messages :) >> > >> > I've dived into `loading` module and I see that currently it is really >> > complicated to store additional data into pkey for each instance. >> > >> > Unfortunately, suggested solutions not good for my project. >> > Also, I think that `shard` meaning in my case is not the same as usual. >> > >> > I want to describe structure of out project, maybe it can help. >> > >> > Here is definition of our databases structure: >> > http://joxi.ru/nAyJVvGiXMv0Dr. >> > We got master db and several geo databases. Catalogs like `users`, >> `groups`, >> > `offers` and other are replicating to geo databases, so that data is >> always >> > the same. >> > But also we have tables like `clicks` and `leads`. Each app instance >> > contains the data about them in database, related to its geo: >> > europe-instance into europe-db, usa-instance into usa-database and so >> on. >> > Periodically master-app pulls clicks and leads to master-database. >> Synced >> > objects always have different ids into master- and get-db, so it is ok. >> > >> > But one time project owner came and said: "I need SAAS". >> > We see, that in current structure it's very hard (and really ugly) to >> > implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and >> other >> > models will be multiplied with tenants amount. >> > >> > I discovered that I can override `get_bind` with another logic and it >> was >> > great: we can remove several `Base` classes, several Sessions and >> several >> > `Orders`. >> > >> > Mechanism looks like this: >> > - we use one instance on each geo for all tenants. >> > - we create separated databases for each tenant: this will be multiplied >> > with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr. >> > - we detect `tenant_id` using `request.host` (we use flask): each domain >> > binds with particular tenant; >> > - we store `tenant_id` into global storage. >> > - we use stored `tenant_id` into `Session.get_bind`: >> > https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a >> > - into flask `teardown_request` we clear `tenant_id` storage and call >> > `Session.remove()` >> > - if we need to read from another get, just write `query = >> > query.set_shard_id(GEO)` `tenant_id` >> > >> > For celery we use this: >> > https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50 >> > Clear to. >> > >> > All this leads us only to pros, without cons: any developer has no need >> to >> > think about database chosing, just write code like there is only one >> > database. If you need to read from another geo-database, just call >> > `query.set_shard(GEO)`, tenant will be appended automatically to it. >> > >> > Problems begin when we tried to test non-flask and non-celery scripts, >> like >> > cron tasks: we may want to query several tenant-databases during one >> > SQLA-transaction, somethins like in my first example: >> > https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f >> > (`assert_got_correct_objects_with_remove` and >> > `assert_got_cached_objects_without_remove`). The result you know. >> > >> > >> > During writing this message, I found out, that we need only one >> additional >> > data for primary key: `connection.bind.url`. I see, that SQLA already >> have >> > it inside `_instance_processor`, so it exists inside `_instance`. I >> think, >> > that `identity_key` should be constructed (in my case) with this code: >> > https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, >> don't >> > you think so? >> >> that's where something needs to happen but SQLAlchemy can't do this in >> such a way that is hardcoded to exactly your particular use case. A >> Query can have lots of entities in it, and if you're doing sharding a >> single result set can refer to any number of shard identifiers within >> not just a single result set but within a single row; they might have >> come from dozens of different databases at once. This could only be >> suited with a very open plugin point that is carefully architected, >> tested, and documented and I don't have the resources to envision this >> for a short-term use case. >> >> > Problems begin when we tried to test non-flask and non-celery scripts, >> like >> > cron tasks: we may want to query several tenant-databases during one >> > SQLA-transaction, >> >> I don't understand why you can't use independent sessions for this, >> because when you query two different databases, you are using two >> independent transactions in any case; they are only coordinated if >> one is using two-phase transactions which I doubt is the case here >> (while SQLAlchemy put a lot of work into making that possible, I don't >> think anyone has ever used that feature). >> >> Instead of: >> >> Session.query(Foo).set_shard(id) >> >> you say: >> >> sessions.get(shard_id).query(Foo) >> >> >> >> >> >> > >> > >> > >> > вт, 21 нояб. 2017 г. в 19:15, Mike Bayer <mike...@zzzcomputing.com>: >> >> >> >> I've looked to see how hard it would be to allow "supplemental" >> >> attributes to form part of the mapper's primary key tuple, and it >> >> would be pretty hard. The "easy" part is getting the mapper to set >> >> itself up with some extra attributes that can deliver some kind of >> >> supplemental value to the identity key. the harder part is then in >> >> loading.py where we get new rows from the DB and need this >> >> value...which means some whole new kind of system would need to >> >> deliver this for any arbitrary part of the result set given a mapping >> >> and the selectable we're looking at (keep in mind a Query can have >> >> lots of the same mapping in a single row with aliases). This would >> >> be very complicated to implement and test. I am not seeing any quick >> >> way to suit this use case, which has also not ever been requested >> >> before. >> >> >> >> >> >> >> >> On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer <mike...@zzzcomputing.com >> > >> >> wrote: >> >> > On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан <a.ch....@gmail.com> >> >> > wrote: >> >> >> Hi guys. >> >> >> >> >> >> I got this code example: >> >> >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f. >> >> >> >> >> >> Here I make custom `Session` and custom `Query`. As you see, >> `Session` >> >> >> has >> >> >> several binds. >> >> >> >> >> >> Also, you can see that there are two functions: >> >> >> `assert_got_correct_objects_with_remove` and >> >> >> `assert_got_cached_objects_without_remove`. >> >> >> >> >> >> The first checks that we got correct results if `Session.remove` >> >> >> called. >> >> >> The second checks, that we got incorrect results if `Session.remove` >> >> >> not >> >> >> called. >> >> >> >> >> >> I understand, that behavior is correct: we don't remove session - >> so, >> >> >> we got >> >> >> same result from "cache-like" >> >> >> `sqlalchemy.orm.loading._instance_processor.session_identity_map`. >> >> >> >> >> >> I want to avoid that mechanism and don't want to use >> >> >> `session_identity_map` >> >> >> for different binds. In ideal, bind should be used as part of key >> for >> >> >> `session_identity_map`, but I guess, that it is not possible. >> >> >> Another way, acceptable for me: disable this mechanism. But I do not >> >> >> found >> >> >> ways to achieve this. >> >> >> And the third option: construct instances manually. Looks like I >> should >> >> >> copy >> >> >> code from `loading` module and add that method to `CustomSession`: >> >> > >> >> > >> >> > there's really no reason at all to use a "ShardedSession" if you have >> >> > overlapping primary key spaces from each of your binds. I'm not >> sure >> >> > if I mentioned this at the beginning of the emails regarding this >> >> > project but I hope that I mentioned just using separate Session >> >> > objects is vastly simpler for non-intricate sharding cases, such as >> >> > where you always know which shard you care about and you don't care >> >> > about any of the others for a certain operation. The point of >> >> > ShardedSession is so that objects pulled from multiple databases can >> >> > be intermingled in the same query and in the same transaction - which >> >> > by definition means they have unique primary keys. If that's not >> >> > what you're doing here I don't see what advantage all this complexity >> >> > is getting you. >> >> > >> >> > If you're still convinced you need to be using a monolithic >> >> > ShardedSession then there needs to be some kind of translation of >> data >> >> > such that the mapper sees unique primary keys across the shards, or >> >> > unique classes. >> >> > >> >> > I've tried to think of ways to do this without too much difficulty >> but >> >> > none of them are really worth the complexity and hackiness it would >> >> > require. The absolutely quickest and most well-supported, no hacks >> >> > required way would be to properly create a composite primary key on >> >> > your classes, where the second column is your shard id: >> >> > >> >> > class A(Base): >> >> > __tablename__ = 'a' >> >> > >> >> > id = Column(Integer, primary_key=True) >> >> > shard_id = Column(Integer, primary_key=True) >> >> > >> >> > I tried to see if the "shard_id" column can be some kind of >> expression >> >> > that is not a Column on the Table but the mapper() is not set up to >> >> > support this unless you mapped the whole class to a select() >> >> > construct, which would make for too-complicated SQL, and you'd still >> >> > need to intercept this select() using events to put the right shard >> id >> >> > in. Another is to create a custom column that renders in a special >> >> > way, but again you need to create events to intercept it in every >> case >> >> > to put the right shard id in, and/or remove it from things like >> >> > insert() statements. >> >> > >> >> > by far your two best solutions are: 1. use separate Session objects >> >> > per shard 2. make sure your data actually has shard-specific primary >> >> > keys >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> >> >> >> >> def instances(self, cursor, __context=None): >> >> >> context = __context >> >> >> if context is None: >> >> >> context = QueryContext(self) >> >> >> return self._custom_instances(self, cursor, context) >> >> >> >> >> >> >> >> >> >> >> >> def custom_instances(query, cursor, context): >> >> >> """copied from `loading.instances` code with disabled >> >> >> `session_identity_map`""" >> >> >> >> >> >> >> >> >> >> >> >> The third way is the most ugly and I want to avoid it. >> >> >> >> >> >> Could you help me with my hard choice and, maybe, suggest any other >> >> >> ways and >> >> >> options? :) >> >> >> >> >> >> Thank you. >> >> >> >> >> >> -- >> >> >> SQLAlchemy - >> >> >> The Python SQL Toolkit and Object Relational Mapper >> >> >> >> >> >> http://www.sqlalchemy.org/ >> >> >> >> >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> >> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a >> full >> >> >> description. >> >> >> --- >> >> >> You received this message because you are subscribed to the Google >> >> >> Groups >> >> >> "sqlalchemy" group. >> >> >> To unsubscribe from this group and stop receiving emails from it, >> send >> >> >> an >> >> >> email to sqlalchemy+unsubscr...@googlegroups.com. >> >> >> To post to this group, send email to sqlalchemy@googlegroups.com. >> >> >> Visit this group at https://groups.google.com/group/sqlalchemy. >> >> >> For more options, visit https://groups.google.com/d/optout. >> >> >> >> -- >> >> SQLAlchemy - >> >> The Python SQL Toolkit and Object Relational Mapper >> >> >> >> http://www.sqlalchemy.org/ >> >> >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a >> full >> >> description. >> >> --- >> >> You received this message because you are subscribed to the Google >> Groups >> >> "sqlalchemy" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an >> >> email to sqlalchemy+unsubscr...@googlegroups.com. >> >> To post to this group, send email to sqlalchemy@googlegroups.com. >> >> Visit this group at https://groups.google.com/group/sqlalchemy. >> >> For more options, visit https://groups.google.com/d/optout. >> > >> > -- >> > >> > Антон >> > >> > -- >> > SQLAlchemy - >> > The Python SQL Toolkit and Object Relational Mapper >> > >> > http://www.sqlalchemy.org/ >> > >> > To post example code, please provide an MCVE: Minimal, Complete, and >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> > description. >> > --- >> > You received this message because you are subscribed to the Google >> Groups >> > "sqlalchemy" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an >> > email to sqlalchemy+unsubscr...@googlegroups.com. >> > To post to this group, send email to sqlalchemy@googlegroups.com. >> > Visit this group at https://groups.google.com/group/sqlalchemy. >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> SQLAlchemy - >> The Python SQL Toolkit and Object Relational Mapper >> >> http://www.sqlalchemy.org/ >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> description. >> --- >> You received this message because you are subscribed to the Google Groups >> "sqlalchemy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to sqlalchemy+unsubscr...@googlegroups.com. >> To post to this group, send email to sqlalchemy@googlegroups.com. >> Visit this group at https://groups.google.com/group/sqlalchemy. >> For more options, visit https://groups.google.com/d/optout. >> > -- > > Антон > -- Антон -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.