Thanks for the valuable suggestions. You are actually correct: the scheme that they are using is sharding across tables, that's why the structure is the same.
And the way I was setting things up is very similar to the example you mentioned (http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName) so that I could query multiple table names using the same class. I put together a small example that using that and a simplified version of Manager and Engineer. Constructing the union the way you suggested works (the criterion goes in the subquery's WHERE clause instead of outside of the union). SELECT anon_1.manager_employee_id AS anon_1_manager_employee_id, anon_1.manager_name AS anon_1_manager_name FROM (SELECT manager.employee_id AS manager_employee_id, manager.name AS manager_name FROM manager WHERE manager.name = :name_1 UNION ALL SELECT engineer.employee_id AS engineer_employee_id, engineer.name AS engineer_name FROM engineer WHERE engineer.name = :name_2) AS anon_1 But now the problem seems to be that SQLAlchemy will only return rows matching the first class. Here's the example code. In this case there is 1 row in the manager & engineer tables that contains the name "test". After constructing the union, I was hoping the query would return two results but it will only return one. However if I run count() on the query, it correctly states that there are two rows. Am I missing something else? class Employee(object): def __init__(self, name): self.name = name def __repr__(self): return self.__class__.__name__ + " " + self.name managers_table = Table('manager', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), ) engineers_table = Table('engineer', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), ) def map_class_to_some_table(cls, table, entity_name, **kw): newcls = type(entity_name, (cls, ), {}) mapper(newcls, table, **kw) return newcls Manager = map_class_to_some_table(Employee, managers_table, 'Manager') Engineer = map_class_to_some_table(Employee, engineers_table, 'Engineer') q1 = session.query(Manager).filter(Manager.name=='test') q2 = session.query(Engineer).filter(Engineer.name=='test') u = q1.union_all(q2) print u.count() #count = 2 rows = u.all() print len(rows) #len = 1 print rows[0] # only contains the manager result On Tuesday, October 9, 2012 9:35:32 AM UTC-5, Michael Bayer wrote: > > > On Oct 9, 2012, at 9:54 AM, Benjamin Gonzalez wrote: > > > Hi, > > I've been using SQLAlchemy for the last couple of weeks. I'll definitely > call myself a beginner so I apologize if I'm missing something obvious with > this question :) > > > I'm working with a database where there is data split across tables with > the exact same structure, so there are cases where I need to query more > than one table and filter by the same criteria. > > > this sounds like you're describing horizontal partitioning. such as if a > particular row of data could equally well be in one table or the other, and > its only some kind of sharding scheme that determines which table. > > > Looking around in the documentation I read about Concrete Table > Inheritance and thought that could help me in this case. > > > yeah concrete inheritance is not really intended for this case, though I > can see what you're going for. > > > And perform a query such as: > > session.query(Employee).filter(Employee.name=='test') > > It will apply the name filter after the union is done: > > SELECT pjoin.employee_id AS pjoin_employee_id, pjoin.name AS pjoin_name, > pjoin.type AS pjoin_type, pjoin.manager_data AS pjoin_manager_data, > pjoin.engineer_info AS pjoin_engineer_info > FROM (SELECT employees.employee_id AS employee_id, employees.name AS > name, CAST(NULL AS VARCHAR(50)) AS manager_data, CAST(NULL AS VARCHAR(50)) > AS engineer_info, 'employee' AS type > FROM employees UNION ALL SELECT managers.employee_id AS employee_id, > managers.name AS name, managers.manager_data AS manager_data, CAST(NULL > AS VARCHAR(50)) AS engineer_info, 'manager' AS type > FROM managers UNION ALL SELECT engineers.employee_id AS employee_id, > engineers.name AS name, CAST(NULL AS VARCHAR(50)) AS manager_data, > engineers.engineer_info AS engineer_info, 'engineer' AS type > FROM engineers) AS pjoin > WHERE pjoin.name = :name_1 > > Is there a way to ask SQLAlchemy to apply the filter to each table in the > inner queries? They all have a name column, and in the case of a large > dataset it's helpful because it avoids loading all the rows for each table > first and then applying the WHERE clause. > > > oh OK so you're taking advantage of the UNION generated by > polymorphic_union to try filtering across all those tables. this is a > funny case because we do have an extension that serves as an example for > "horizontal partitioning", which is in sqlalchemy.ext.horizontal_shard, but > that example is architected around the idea that you're talking to multiple > database backends, each of which has just one table in it. So it doesn't > do the UNION thing which of course is preferable if it is possible. > > One example that might be worth peeking at is how this mapping scheme can > be set up without necessarily using Concrete inheritance, though really > it's probably easier that you are just using the concrete system, that > example is at: http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName. > It has a "classical" version and a "declarative" version, though the > declarative is a bit awkward. Stick with the concrete if that's easier. > > As far as the query, to come up with an ad-hoc, filtered UNION would have > to be invented. I'd start by working it out manually using Query, then > perhaps trying to genericize it, either that you have a simple function > that comes up with a Query with unions, or if you really need to do this > fluently, as a Query subclass. Here's sort of the idea of that: > > def query_all_shards(session, criterion): > q = None > for subclass in MyConcreteBase.__subclasses__(): > if q is None: > q = session.query(subclass).filter(criterion) > else: > union_q = session.query(subclass).filter(criterion) > q = q.union_all(union_q) > return q > > I haven't tested this, but something like that. Then if you wanted to > bake this into Query, you could build a Query subclass that does this, > giving it a method like "union_on_shards()" that would trigger a "union" > operation similar to the above. There's a Query subclass example in the > sqlalchemy/ext/horizontal_shard.py module, and also in some of the wiki > examples such as > http://www.sqlalchemy.org/trac/wiki/UsageRecipes/PreFilteredQuery . > > Overall I'd stay away from polymorphic_union as a mapper-level default > since it generates huge beasts of queries, which you often won't want. > > > > > > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/uN72Da50-AoJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
