On Thu, Feb 28, 2019 at 1:34 PM Andrew Martin <agmar...@gmail.com> wrote: > > Hi All, > > Trying something out here for an analytics app here, and I'm not sure any of > this is a good idea. I'm trying to create a data model that's flexible enough > to handle a lot of different types of analysis. From plain CSVs to time > series to survey questions with complex hierarchies. My approach is to have a > pretty plain model and query as needed to construct dataframes that will end > up in pandas and sci-kit learn. > > The basic idea is that there's a dataset that has variables associated with > it. Variables have values associated. I want to retain type information on > the values. If I'm reading a file and loading it, I want to store an int as > an int or a datetime as a datetime and not stuff everything into a string and > have to guess what things are based on variable metadata. That's where I'm > kind of stuck. I think I have the multi-table inheritance set up correctly to > store different types of value. I'm totally unsure how to query without > having to know the subclass. I'd like to be able to extract values for pandas > processing with something like this: > > vals = Values.value.filter_by(variable_id=123456).all() > > without having to know per variable which subclass to choose. I.e., I don't > want to have to do > vals = IntValue.value.filter_by(variable_id=123456).all() >
so "Values.value.filter_by(variable_id=123456).all()" is not meaningful to me because I don't know what "Values.value" is, is that a dynamic relationship hence you're getting a Query ? What SQL do you see this rendering? > > The second part of my question is only relevant if this whole design isn't > hosed, which I'm fine if you tell me that it is and I need to go back to the > drawing board. > > The second part is how to extract an entire dataset by ID in a way that would > be comfortable in pandas. If I were doing this in raw SQL, I'd use a CTE to > get the variables by dataset id and use a pivot to create the > data-frame-shaped table. I'm confounded by how to do this in alchemy, > especially with the polymorphic setup. can you show me that ? FTR we usually reduce the JOINs by putting all the "value" columns in one table but otherwise similar approach, see: https://docs.sqlalchemy.org/en/latest/_modules/examples/vertical/dictlike-polymorphic.html > > > Here are my current model definitions: > > > from sqlalchemy import Column, Index, Integer, Text, DateTime, Float, > ForeignKey > from sqlalchemy.orm import relationship > > from .meta import Base > > > class DataSet(Base): > __tablename__ = 'datasets' > id = Column(Integer, primary_key=True) > name = Column(Text, nullable=False) > description = Column(Text, nullable=True) > > > > class Variable(Base): > __tablename__ = 'variables' > id = Column(Integer, primary_key=True) > dataset_id = Column(Integer, ForeignKey('datasets.id'), Nullable=False) > name = Column(Text, nullable=False) > description = Column(Text, nullable=True) > group_var_col_id = Column(Integer, ForeignKey('variables.id'), > nullable=True) > group_var_row_id = Column(Integer, ForeignKey('variables.id'), > nullable=True) > > value_map = Column(Text, nullable=True) #change to JSONB when move from > SQLite to Postgres > > dataset = relationship('DataSet', backref='variables') > > > class Value(Base): > __tablename__ = 'values' > id = Column(Integer, primary_key=True) > variable_id = Column(Integer, ForeignKey('variables.id'), Nullable=False) > observation_id = Column(Text, nullable=False) > value_type = Column(Text, nullable=False) > > variable = relationship('Variable', backref='values') > > __mapper_args__ = {'polymorphic_on': value_type} > > > class IntValue(Value): > __tablename__ = 'int_values' > id = Column(None, ForeignKey('values.id'), primary_key=True) > value = Column(Integer, nullable=True) > > __mapper_args__ = {'polymorphic_identity': 'int'} > > > class StringValue(Value): > __tablename__ = 'string_values' > id = Column(None, ForeignKey('values.id'), primary_key=True) > value = Column(Text, nullable=True) > > __mapper_args__ = {'polymorphic_identity': 'string'} > > > class FloatValue(Value): > __tablename__ = 'float_values' > id = Column(None, ForeignKey('values.id'), primary_key=True) > value = Column(Float, nullable=True) > > __mapper_args__ = {'polymorphic_identity': 'float'} > > > class DateTimeValue(Value): > __tablename__ = 'datetime_values' > id = Column(None, ForeignKey('values.id'), primary_key=True) > value = Column(DateTime, nullable=True) > > __mapper_args__ = {'polymorphic_identity': 'datetime'} > > Thanks in advance for any suggestions you might have! > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.