Thanks so much!  Your pointers were exactly what I needed,
specifically the bit which led me to discover exclude_properties.
I'll leave my working code here in case it ever helps anybody else
out:

from sqlalchemy import Column, Date, Enum, Integer, String, Table,
create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import ColumnProperty, sessionmaker
from sqlalchemy.orm.mapper import class_mapper

engine = create_engine('mysql://user:password@localhost/
issue_tracker')
Session = sessionmaker(bind=engine)

Base = declarative_base()

tables = {'issue_type':
              Table('issue_type', Base.metadata,
                    Column('id', Integer, primary_key=True),
                    Column('type', Enum('Bug', 'Feature',
'Improvement', 'One-Time Task')),
                    Column('created', Date),
                    Column('num_visits', Integer)),

          'issue_type_label':
              Table('issue_type_label', Base.metadata,
                    Column('id', Integer, primary_key=True),
                    Column('type', Enum('Bug', 'Feature',
'Improvement', 'One-Time Task')),
                    Column('label_id', String),
                    Column('created', Date),
                    Column('num_visits', Integer))}

def get_columns(model):
    return [x.key for x in class_mapper(model).iterate_properties if
isinstance(x, ColumnProperty)]

class IssueType(Base):
    __table__ = tables['issue_type']


class IssueLabel(Base):
    __table__ = tables['issue_type_label']
    __mapper_args__ = {'exclude_properties': ['type']}


class IssueTypeLabel(Base):
    __table__ = tables['issue_type_label']


print "issue type:", get_columns(IssueType)
print "issue label:", get_columns(IssueLabel)
print "issue type label:", get_columns(IssueTypeLabel)


This code correctly prints the following:
    issue type: ['id', 'type', 'created', 'num_visits']
    issue label: ['id', 'label_id', 'created', 'num_visits']
    issue type label: ['id', 'type', 'label_id', 'created',
'num_visits']

On Nov 16, 8:49 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Nov 16, 2012, at 2:59 PM, Rob Crowell wrote:
>
>
>
>
>
>
>
>
>
>
>
> > Thanks for the help so far Michael!  I can explain a little more about
> > what I'm trying to do (I'm using a fictional application here but I
> > think it pretty accurately translates into my actual application).
>
> > BACKGROUND
> > ----------
>
> > Let's say I'm writing an issue tracking application.  Each issue that
> > we're tracking has a type (an issue must have exactly one type), and
> > each issue may have an unlimited number of user-provided labels.
>
> > Each day, people browse our issue tracker and each time they do they
> > generate a page view on the issue.  Here's an example of one day's
> > worth of data:
>
> >    IssueA: <Views: 1, Type: "Bug", Labels: ["login-page (id=1)",
> > "show-stopper (id=2)"]>
> >    IssueB: <Views: 20, Type: "One-Time Task", Labels: ["disk-full
> > (id=3)", "show-stopper (id=2)"]>
> >    IssueC: <Views: 300, Type: "Bug", Labels: ["login-page (id=1)"]>
>
> > The BigCo. I'm working for is very interested in knowing which issues
> > are read by the most people, and they need the ability to generate
> > reports sliced by arbitrary date ranges.  However, we can tolerate a
> > day delay, so we are writing summary tables each night.  Two of these
> > summary tables are aggregated by either issue type or label, and we
> > also write a third table that can be used to drill-down and see page
> > visits bucketed by both type and label:
>
> > CREATE TABLE `issue_type` (
> >  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
> >  `created` datetime NOT NULL,
> >  `type` enum('Bug','Feature','Improvement','One-Time Task') NOT NULL,
> >  `num_watchers` int(10) unsigned NOT NULL,
> >  PRIMARY KEY (`id`)
> > )
>
> > CREATE TABLE `issue_label` (
> >  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
> >  `created` datetime NOT NULL,
> >  `label_id` int(10) unsigned NOT NULL,
> >  `num_watchers` int(10) unsigned NOT NULL,
> >  PRIMARY KEY (`id`)
> > )
>
> > CREATE TABLE `issue_type_label` (
> >  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
> >  `created` datetime NOT NULL,
> >  `type` enum('Bug','Feature','Improvement','One-Time Task') NOT NULL,
> >  `label_id` int(10) unsigned NOT NULL,
> >  `num_visits` int(10) unsigned NOT NULL,
> >  PRIMARY KEY (`id`)
> > )
>
> > So we'd run these insert statements at midnight:
>
> > INSERT INTO issue_type (created, type, num_visits) VALUES
> >    ("2012-11-15", "Bug", 301),
> >    ("2012-11-15", "One-Time Task", 20);
>
> > INSERT INTO issue_labels (created, label_id, num_visits) VALUES
> >    ("2012-11-15", 1, 301),
> >    ("2012-11-15", 2, 21),
> >    ("2012-11-15", 3, 20);
>
> > INSERT INTO issue_type_label (created, type, label_id, num_visits)
> > VALUES
> >    ("2012-11-15", "Bug", 1, 301),
> >    ("2012-11-15", "Bug", 2, 1),
> >    ("2012-11-15", "One-Time Task", 3, 20),
> >    ("2012-11-15", "One-Time Task", 2, 20);
>
> > Now when we want to generate the summary reports, we query one of the
> > first two tables (if we're generating a report aggregated by issue
> > type we hit issue_type, if we're generating a report aggregated by
> > label we hit issue_label), and when we want to run a drill-down query
> > both both type and label, we hit issue_type_label:
>
> >    # broken down by type
> >    SELECT type, SUM(num_visits) FROM issue_type WHERE created >=
> > "2012-11-01" AND created <= "2012-11-16" GROUP BY type;
>
> >    # broken down by label
> >    SELECT label_id, SUM(num_visits) FROM issue_label WHERE created >=
> > "2012-11-01" AND created <= "2012-11-16" GROUP BY label_id;
>
> >    # broken down by both type and label
> >    SELECT type, label_id, SUM(num_visits) FROM issue_type_label WHERE
> > created >= "2012-11-01" AND created <= "2012-11-16" GROUP BY type,
> > label_id;
>
> > This works fine and is very fast for our application.  However, it's a
> > bit redundant since each issue has exactly one issue type, so we can
> > safely ignore it when generating the label summary report.  So really,
> > we can get away with not having the issue_label table at all, and just
> > run this instead (we must continue to have the issue_type table
> > however, since a single issue can have many labels or none at all):
>
> >    # broken down by label
> >    SELECT label_id, SUM(num_visits) FROM issue_type_label WHERE
> > created >= "2012-11-01" AND created <= "2012-11-16" GROUP BY label_id;
>
> > THE ISSUE
> > ---------
>
> > In reality, we have many types of fields that we generate reports on
> > (not just type and label), so we have more than a dozen of these
> > aggregate tables rolled up by different sets of fields.  While porting
> > our data access layer to SQLAlchemy (we have a separate process to
> > write these tables which we aren't going to move to SQLAlchemy), I've
> > created a bunch of functions that use reflection on the various Model
> > objects I've built to figure out what columns we need to select.
>
> > Each column that we don't want to GROUP BY or filter out with a WHERE
> > clause gets wrapped by func.sum() and passed to a call to
> > session.query().  My columns are very consistently named so I have
> > automated almost all of this using object reflection, which lets my
> > code populate which columns to select, group by, and sum on
> > automatically.  I really don't want to write a custom call to query()
> > for each table type that we have since we have so many and they are
> > accessed in basically identical ways.
>
> > The issue, however, comes when trying to generate the report bucketed
> > by label.  Since the issue_type_label table has both type and label_id
> > columns, my code will always GROUP BY both of these columns unless I
> > introduce various hacks to suppress selecting and grouping by these
> > columns all over the place...
>
> without having a deep understanding of your system and only time for a 
> cursory review of this level of detail, it seems like you just need some 
> system of generating a Query(), or even just a select(), programatically.    
> You've got naming schemes and consistent inputs/outputs, I would not even use 
> ORM mappings for this beyond just simple, traditional maps to each table.
>
> > What I want to be able to do is to define 2 models that are both
> > backed by the same table, but have declared different sets of Columns
> > so that the reflection code I've written doesn't have to be hacked to
> > shreds to exclude the type column when I don't want it;
>
> this is likely a mis-use of ORM mappings.   an ORM mapping is generally one 
> model for one Table, and that's it, with some very occasional use cases where 
> you might be selecting from more than one table at once.
>
> If you want to select subsets of columns to select from, that's what 
> session.query() or select() does.
>
> You need to write code that does this:
>
> def my_model_one():
>     return query(*getattr(MyModel, colname) for colname in 
> get_colnames_for_model_one())
>
> def my_model_two():
>     return query(*getattr(MyModel, colname) for colname in 
> get_colnames_for_model_two())
>
> I wouldn't be trying to define ORM mappings that define particular 
> subqueries, basically.
>
>
>
>
>
>
>
>
>
> > I will simply
> > define one model that doesn't create a Column for type and the
> > reflection code will ignore it when generating queries.  So this is
> > what I want to write, but it generates the InvalidRequestError:
>
> > from sqlalchemy import Column, Date, Enum, Integer, String,
> > create_engine, desc, func
> > from sqlalchemy.ext.declarative import declarative_base
> > from sqlalchemy.orm import sessionmaker
>
> > engine = create_engine('mysql://root:password@localhost/
> > issue_tracker')
> > Session = sessionmaker(bind=engine)
>
> > Base = declarative_base()
>
> > class IssueType(Base):
> >    __tablename__ = 'issue_type'
> >    id = Column('id', Integer, primary_key=True)
> >    type = Column('type', Enum('Bug', 'Feature', 'Improvement', 'One-
> > Time Task'))
> >    created = Column('created', Date)
> >    num_visits = Column('num_visits', Integer)
>
> > class IssueLabel(Base):
> >    __tablename__ = 'issue_type_label'
> >    id = Column('id', Integer, primary_key=True)
> >    label_id = Column('label_id', String)
> >    created = Column('created', Date)
> >    num_visits = Column('num_visits', Integer)
>
> > class IssueTypeLabel(Base):
> >    __tablename__ = 'issue_type_label'
> >    id = Column('id', Integer, primary_key=True)
> >    type = Column('type', Enum('Bug', 'Feature', 'Improvement', 'One-
> > Time Task'))
> >    label_id = Column('label_id', String)
> >    created = Column('created', Date)
> >    num_visits = Column('num_visits', Integer)
>
> > Correct me if I'm wrong, but I believe the methods you described above
> > will all inherit the same column definitions for the table.  However,
> > that's exactly what I don't want to happen!  I poked around in the
> > source and noticed there is a table argument called keep_existing
> > which I tried using, but its behavior seemed inconsistent and/or buggy
> > with respect to what columns were reported via the reflection I was
> > doing.  Maybe this isn't the use case it was intended for?
>
> the MetaData object is intended to store collections of Table objects, keyed 
> to their name.
>
> If you wanted to have two ORM mappings, each one mapped to a subset of 
> columns on one Table, you need to map to a Table object:
>
> t = Table('mytable', metadata,
>
>         Column('x', ...),
>         Column('y', ...),
>         Column('z', ...),
>         Column('q', ...),
>         Column('p', ...),
>         Column('r', ...)
> )
>
> class ClassOne(Base):
>     __table__ = t
>
>    __mapper_args__ = {'include_columns':['x', 'y', 'z'])
>
> class ClassTwo(Base):
>     __table__ = t
>
>    __mapper_args__ = {'include_columns':['q', 'p', 'r'])
>
> Check the docs for mapper() for more detail on include_columns, I'm working 
> from memory on how it works (I've never needed it).
>
> http://docs.sqlalchemy.org/en/latest/orm/mapper_config.html?highlight...
>
>
>
> > Its possible that I can declare 2 different Base classes with two
> > separate calls to declarative_base(),
>
> not ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to