Thanks so much! Your pointers were exactly what I needed, specifically the bit which led me to discover exclude_properties. I'll leave my working code here in case it ever helps anybody else out:
from sqlalchemy import Column, Date, Enum, Integer, String, Table, create_engine from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import ColumnProperty, sessionmaker from sqlalchemy.orm.mapper import class_mapper engine = create_engine('mysql://user:password@localhost/ issue_tracker') Session = sessionmaker(bind=engine) Base = declarative_base() tables = {'issue_type': Table('issue_type', Base.metadata, Column('id', Integer, primary_key=True), Column('type', Enum('Bug', 'Feature', 'Improvement', 'One-Time Task')), Column('created', Date), Column('num_visits', Integer)), 'issue_type_label': Table('issue_type_label', Base.metadata, Column('id', Integer, primary_key=True), Column('type', Enum('Bug', 'Feature', 'Improvement', 'One-Time Task')), Column('label_id', String), Column('created', Date), Column('num_visits', Integer))} def get_columns(model): return [x.key for x in class_mapper(model).iterate_properties if isinstance(x, ColumnProperty)] class IssueType(Base): __table__ = tables['issue_type'] class IssueLabel(Base): __table__ = tables['issue_type_label'] __mapper_args__ = {'exclude_properties': ['type']} class IssueTypeLabel(Base): __table__ = tables['issue_type_label'] print "issue type:", get_columns(IssueType) print "issue label:", get_columns(IssueLabel) print "issue type label:", get_columns(IssueTypeLabel) This code correctly prints the following: issue type: ['id', 'type', 'created', 'num_visits'] issue label: ['id', 'label_id', 'created', 'num_visits'] issue type label: ['id', 'type', 'label_id', 'created', 'num_visits'] On Nov 16, 8:49 pm, Michael Bayer <mike...@zzzcomputing.com> wrote: > On Nov 16, 2012, at 2:59 PM, Rob Crowell wrote: > > > > > > > > > > > > > Thanks for the help so far Michael! I can explain a little more about > > what I'm trying to do (I'm using a fictional application here but I > > think it pretty accurately translates into my actual application). > > > BACKGROUND > > ---------- > > > Let's say I'm writing an issue tracking application. Each issue that > > we're tracking has a type (an issue must have exactly one type), and > > each issue may have an unlimited number of user-provided labels. > > > Each day, people browse our issue tracker and each time they do they > > generate a page view on the issue. Here's an example of one day's > > worth of data: > > > IssueA: <Views: 1, Type: "Bug", Labels: ["login-page (id=1)", > > "show-stopper (id=2)"]> > > IssueB: <Views: 20, Type: "One-Time Task", Labels: ["disk-full > > (id=3)", "show-stopper (id=2)"]> > > IssueC: <Views: 300, Type: "Bug", Labels: ["login-page (id=1)"]> > > > The BigCo. I'm working for is very interested in knowing which issues > > are read by the most people, and they need the ability to generate > > reports sliced by arbitrary date ranges. However, we can tolerate a > > day delay, so we are writing summary tables each night. Two of these > > summary tables are aggregated by either issue type or label, and we > > also write a third table that can be used to drill-down and see page > > visits bucketed by both type and label: > > > CREATE TABLE `issue_type` ( > > `id` int(10) unsigned NOT NULL AUTO_INCREMENT, > > `created` datetime NOT NULL, > > `type` enum('Bug','Feature','Improvement','One-Time Task') NOT NULL, > > `num_watchers` int(10) unsigned NOT NULL, > > PRIMARY KEY (`id`) > > ) > > > CREATE TABLE `issue_label` ( > > `id` int(10) unsigned NOT NULL AUTO_INCREMENT, > > `created` datetime NOT NULL, > > `label_id` int(10) unsigned NOT NULL, > > `num_watchers` int(10) unsigned NOT NULL, > > PRIMARY KEY (`id`) > > ) > > > CREATE TABLE `issue_type_label` ( > > `id` int(10) unsigned NOT NULL AUTO_INCREMENT, > > `created` datetime NOT NULL, > > `type` enum('Bug','Feature','Improvement','One-Time Task') NOT NULL, > > `label_id` int(10) unsigned NOT NULL, > > `num_visits` int(10) unsigned NOT NULL, > > PRIMARY KEY (`id`) > > ) > > > So we'd run these insert statements at midnight: > > > INSERT INTO issue_type (created, type, num_visits) VALUES > > ("2012-11-15", "Bug", 301), > > ("2012-11-15", "One-Time Task", 20); > > > INSERT INTO issue_labels (created, label_id, num_visits) VALUES > > ("2012-11-15", 1, 301), > > ("2012-11-15", 2, 21), > > ("2012-11-15", 3, 20); > > > INSERT INTO issue_type_label (created, type, label_id, num_visits) > > VALUES > > ("2012-11-15", "Bug", 1, 301), > > ("2012-11-15", "Bug", 2, 1), > > ("2012-11-15", "One-Time Task", 3, 20), > > ("2012-11-15", "One-Time Task", 2, 20); > > > Now when we want to generate the summary reports, we query one of the > > first two tables (if we're generating a report aggregated by issue > > type we hit issue_type, if we're generating a report aggregated by > > label we hit issue_label), and when we want to run a drill-down query > > both both type and label, we hit issue_type_label: > > > # broken down by type > > SELECT type, SUM(num_visits) FROM issue_type WHERE created >= > > "2012-11-01" AND created <= "2012-11-16" GROUP BY type; > > > # broken down by label > > SELECT label_id, SUM(num_visits) FROM issue_label WHERE created >= > > "2012-11-01" AND created <= "2012-11-16" GROUP BY label_id; > > > # broken down by both type and label > > SELECT type, label_id, SUM(num_visits) FROM issue_type_label WHERE > > created >= "2012-11-01" AND created <= "2012-11-16" GROUP BY type, > > label_id; > > > This works fine and is very fast for our application. However, it's a > > bit redundant since each issue has exactly one issue type, so we can > > safely ignore it when generating the label summary report. So really, > > we can get away with not having the issue_label table at all, and just > > run this instead (we must continue to have the issue_type table > > however, since a single issue can have many labels or none at all): > > > # broken down by label > > SELECT label_id, SUM(num_visits) FROM issue_type_label WHERE > > created >= "2012-11-01" AND created <= "2012-11-16" GROUP BY label_id; > > > THE ISSUE > > --------- > > > In reality, we have many types of fields that we generate reports on > > (not just type and label), so we have more than a dozen of these > > aggregate tables rolled up by different sets of fields. While porting > > our data access layer to SQLAlchemy (we have a separate process to > > write these tables which we aren't going to move to SQLAlchemy), I've > > created a bunch of functions that use reflection on the various Model > > objects I've built to figure out what columns we need to select. > > > Each column that we don't want to GROUP BY or filter out with a WHERE > > clause gets wrapped by func.sum() and passed to a call to > > session.query(). My columns are very consistently named so I have > > automated almost all of this using object reflection, which lets my > > code populate which columns to select, group by, and sum on > > automatically. I really don't want to write a custom call to query() > > for each table type that we have since we have so many and they are > > accessed in basically identical ways. > > > The issue, however, comes when trying to generate the report bucketed > > by label. Since the issue_type_label table has both type and label_id > > columns, my code will always GROUP BY both of these columns unless I > > introduce various hacks to suppress selecting and grouping by these > > columns all over the place... > > without having a deep understanding of your system and only time for a > cursory review of this level of detail, it seems like you just need some > system of generating a Query(), or even just a select(), programatically. > You've got naming schemes and consistent inputs/outputs, I would not even use > ORM mappings for this beyond just simple, traditional maps to each table. > > > What I want to be able to do is to define 2 models that are both > > backed by the same table, but have declared different sets of Columns > > so that the reflection code I've written doesn't have to be hacked to > > shreds to exclude the type column when I don't want it; > > this is likely a mis-use of ORM mappings. an ORM mapping is generally one > model for one Table, and that's it, with some very occasional use cases where > you might be selecting from more than one table at once. > > If you want to select subsets of columns to select from, that's what > session.query() or select() does. > > You need to write code that does this: > > def my_model_one(): > return query(*getattr(MyModel, colname) for colname in > get_colnames_for_model_one()) > > def my_model_two(): > return query(*getattr(MyModel, colname) for colname in > get_colnames_for_model_two()) > > I wouldn't be trying to define ORM mappings that define particular > subqueries, basically. > > > > > > > > > > > I will simply > > define one model that doesn't create a Column for type and the > > reflection code will ignore it when generating queries. So this is > > what I want to write, but it generates the InvalidRequestError: > > > from sqlalchemy import Column, Date, Enum, Integer, String, > > create_engine, desc, func > > from sqlalchemy.ext.declarative import declarative_base > > from sqlalchemy.orm import sessionmaker > > > engine = create_engine('mysql://root:password@localhost/ > > issue_tracker') > > Session = sessionmaker(bind=engine) > > > Base = declarative_base() > > > class IssueType(Base): > > __tablename__ = 'issue_type' > > id = Column('id', Integer, primary_key=True) > > type = Column('type', Enum('Bug', 'Feature', 'Improvement', 'One- > > Time Task')) > > created = Column('created', Date) > > num_visits = Column('num_visits', Integer) > > > class IssueLabel(Base): > > __tablename__ = 'issue_type_label' > > id = Column('id', Integer, primary_key=True) > > label_id = Column('label_id', String) > > created = Column('created', Date) > > num_visits = Column('num_visits', Integer) > > > class IssueTypeLabel(Base): > > __tablename__ = 'issue_type_label' > > id = Column('id', Integer, primary_key=True) > > type = Column('type', Enum('Bug', 'Feature', 'Improvement', 'One- > > Time Task')) > > label_id = Column('label_id', String) > > created = Column('created', Date) > > num_visits = Column('num_visits', Integer) > > > Correct me if I'm wrong, but I believe the methods you described above > > will all inherit the same column definitions for the table. However, > > that's exactly what I don't want to happen! I poked around in the > > source and noticed there is a table argument called keep_existing > > which I tried using, but its behavior seemed inconsistent and/or buggy > > with respect to what columns were reported via the reflection I was > > doing. Maybe this isn't the use case it was intended for? > > the MetaData object is intended to store collections of Table objects, keyed > to their name. > > If you wanted to have two ORM mappings, each one mapped to a subset of > columns on one Table, you need to map to a Table object: > > t = Table('mytable', metadata, > > Column('x', ...), > Column('y', ...), > Column('z', ...), > Column('q', ...), > Column('p', ...), > Column('r', ...) > ) > > class ClassOne(Base): > __table__ = t > > __mapper_args__ = {'include_columns':['x', 'y', 'z']) > > class ClassTwo(Base): > __table__ = t > > __mapper_args__ = {'include_columns':['q', 'p', 'r']) > > Check the docs for mapper() for more detail on include_columns, I'm working > from memory on how it works (I've never needed it). > > http://docs.sqlalchemy.org/en/latest/orm/mapper_config.html?highlight... > > > > > Its possible that I can declare 2 different Base classes with two > > separate calls to declarative_base(), > > not ... > > read more » -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.