[sqlalchemy] Generic File Loader issue with Booleans

Andrew Martin Tue, 27 Oct 2020 20:57:23 -0700

This is probably a weirdly specific question, but I have a workflow that 
involves loading lots of CSVs into Postgres. Some of them are very large, 
so I want to cut overhead and not use CSV Dictreader. Wanted to use named 
tuples instead.


Here's a header and a line from a file:

date_dim_id,date_actual,epoch,day_suffix,day_name,day_of_week,day_of_month,
day_of_quarter,day_of_year,week_of_month,week_of_year,week_of_year_iso,
month_actual,month_name,month_name_abbreviated,quarter_actual,quarter_name,
year_actual,first_day_of_week,last_day_of_week,first_day_of_month,
last_day_of_month,first_day_of_quarter,last_day_of_quarter,first_day_of_year
,last_day_of_year,mmyyyy,mmddyyyy,weekend_indr


Here's a line:

20150101,1/1/15,1420070400,1st,Thursday ,4,1,1,1,1,1,2015-W01-4,1,January 
 
,Jan,1,First,2015,12/29/14,1/4/15,1/1/15,1/31/15,1/1/15,3/31/15,1/1/15,12/31/15,12015,1012015,0


The idea was a generic file loader that takes a filename and a class and 
then loads it. Here's the class:


class DimDate(Base):
    __tablename__ = "dim_date"

    date_dim_id = Column(Integer, primary_key=True)
    date_actual = Column(Date, nullable=False, index=True)
    epoch = Column(BigInteger, nullable=False)
    day_suffix = Column(String(4), nullable=False)
    day_name = Column(String(9), nullable=False)
    day_of_week = Column(Integer, nullable=False)
    day_of_month = Column(Integer, nullable=False)
    day_of_quarter = Column(Integer, nullable=False)
    day_of_year = Column(Integer, nullable=False)
    week_of_month = Column(Integer, nullable=False)
    week_of_year = Column(Integer, nullable=False)
    week_of_year_iso = Column(CHAR(10), nullable=False)
    month_actual = Column(Integer, nullable=False)
    month_name = Column(String(9), nullable=False)
    month_name_abbreviated = Column(CHAR(3), nullable=False)
    quarter_actual = Column(Integer, nullable=False)
    quarter_name = Column(String(9), nullable=False)
    year_actual = Column(Integer, nullable=False)
    first_day_of_week = Column(Date, nullable=False)
    last_day_of_week = Column(Date, nullable=False)
    first_day_of_month = Column(Date, nullable=False)
    last_day_of_month = Column(Date, nullable=False)
    first_day_of_quarter = Column(Date, nullable=False)
    last_day_of_quarter = Column(Date, nullable=False)
    first_day_of_year = Column(Date, nullable=False)
    last_day_of_year = Column(Date, nullable=False)
    mmyyyy = Column(CHAR(6), nullable=False)
    mmddyyyy = Column(CHAR(10), nullable=False)
    weekend_indr = Column(Boolean, nullable=False)


What I ran into is a couple of things. 1. Python True/False throws errors 
when writing to a Boolean. 2. You seem to have to int() a 1 or 0 to avoid a 
dbapi error. 

Is this expected? And does it have more to do with Postgres than it does 
SQLA?

Here's the code I came up with for generic file loader that actually works. 
I'm sure it's garbage, and I'd like to refactor it, but it works.

def load_file(file_name, cls):
    """assumes that csv file has headers and that headers match the
    names of the columns for a given sqla class. Also assumes that the
    sqla class is inheriting from meta.Base"""
    inst = inspect(cls)
    # attr_names = sorted([c_attr.key for c_attr in 
inst.mapper.column_attrs])
    attr_vals = sorted([(c.name, c.type) for c in inst.c])


    s = get_sqla_session()


    with open(file_name, newline="") as infile:
        reader = csv.reader(infile)
        Data = namedtuple("Data", next(reader))
        tuple_fields = Data._fields
        for data in map(Data._make, [(c.strip() for c in row) for row in 
reader]):
            new_obj = cls()
            for attr in attr_vals:
                for field in tuple_fields:
                    if attr[0] == field:
                        if str(attr[1]) == "BOOLEAN":
                            setattr(new_obj, field, int(getattr(data, field
)))
                        else:
                            # print(data, attr[1])
                            setattr(new_obj, field, getattr(data, field))


            new_obj.id = IDService.create_snowflake_id()
            new_obj.is_deleted = 0


            s.add(new_obj)
    s.commit()


This isn't really a big deal, and maybe not even appropriate, but I had an 
absolute shit 12 hours trying to figure out why SQLA booleans are such a 
hassle.

If anyone has thoughts, I'm happy to hear them.

Thank you!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/43b9c281-978a-4def-a279-f518201bd884o%40googlegroups.com.

[sqlalchemy] Generic File Loader issue with Booleans

Reply via email to