I posted this question on Stack Overflow a few days ago, and got some 
response but nothing that really solves my problem. I'm hoping that I can 
get some more input here. The initial recommendation was to keep all the 
data in two tables (one meta and one data table), but this might become 
unwieldy with the number of rows (I estimate it will be about 6 billion 
rows in total).

I have many (~2000) locations with time series data. Each time series has 
millions of rows. I would like to store these in a Postgres database. My 
current approach is to have a table for each location time series, and a 
meta table which stores information about each location (coordinates, 
elevation etc). I am using SQLAlchemy to create and query the tables. I 
would like to have a relationship between the meta table and each time 
series table to do queries like "select all locations that have data 
between date A and date B" and "select all data for date A and export a csv 
with coordinates". What is the best way to create many tables with the same 
structure (only the name is different) and have a relationship with a meta 
table? Or should I use a different database design?

Currently I am using this type of approach to generate a lot of similar 
mappings:

from sqlalchemy import create_engine, MetaDatafrom sqlalchemy.types import 
Float, String, DateTime, Integerfrom sqlalchemy import Column, ForeignKeyfrom 
sqlalchemy.ext.declarative import declarative_basefrom sqlalchemy.orm import 
sessionmaker, relationship, backref
Base = declarative_base()

def make_timeseries(name):
    class TimeSeries(Base):

        __tablename__ = name
        table_name = Column(String(50), ForeignKey('locations.table_name'))
        datetime = Column(DateTime, primary_key=True)
        value = Column(Float)

        location = relationship('Location', backref=backref('timeseries',
                                lazy='dynamic'))

        def __init__(self, table_name, datetime, value):
            self.table_name = table_name
            self.datetime = datetime
            self.value = value

        def __repr__(self):
            return "{}: {}".format(self.datetime, self.value)

    return TimeSeries

class Location(Base):

    __tablename__ = 'locations'
    id = Column(Integer, primary_key=True)
    table_name = Column(String(50), unique=True)
    lon = Column(Float)
    lat = Column(Float)
if __name__ == '__main__':
    connection_string = 'postgresql://user:pw@localhost/location_test'
    engine = create_engine(connection_string)
    metadata = MetaData(bind=engine)
    Session = sessionmaker(bind=engine)
    session = Session()

    TS1 = make_timeseries('ts1')
    # TS2 = make_timeseries('ts2')   # this breaks because of the foreign key
    Base.metadata.create_all(engine)
    session.add(TS1("ts1", "2001-01-01", 999))
    session.add(TS1("ts1", "2001-01-02", -555))

    qs = session.query(Location).first()
    print qs.timeseries.all()


This approach has some problems, most notably that if I create more than 
one TimeSeries object the foreign key doesn't work. Previously I've used 
some work-arounds (such as not defining a foreign key), but it all seems 
like a big hack and I feel that there must be a better way of doing this. 
How should I organise and access my data?

Any advice appreciated, 

Peter

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to