I'm working on a project to parse through a large text file (1GB) of 
records.  Once parsed, each record gets sent to the DB.  Due to the size of 
the file, I've been working on a streaming/functional approach that will 
keep my memory usage constant.  

I've been able to simply take the DB out of the equation and parse through 
all of the records and memory usage stays constant.  But, as soon as I 
bring SA into the picture, memory usage continues to climb through the 
lifetime of the program.

I originally started using the ORM, and thought the Session would be the 
culprit, but have now drilled down deep enough into the problem that it 
appears to be an issue even when using simple connections.



*using psycopg:*
    connection = db.engine.connect().connection
    with connection.cursor() as cursor:
        for count, statement in enumerate(MEXChunker(mex_file).
yield_merchants()):
            for pc_data in statement.program_charges:
                insert_sql = "INSERT INTO stage.tsys_program_charges 
(reporting_month," \
                             "reporting_year, charge_amount, 
merchant_number, officer_code) " \
                             "VALUES (%s, %s, %s, %s, %s)"
                cursor.execute(insert_sql, pc_data)

The above, when ran, shows memory ("RES" in `top`) quickly climb and then 
hold around 183K.  The resources module reports "max rss" at 182268 at the 
end of running the script.  Those memory numbers are just about the same if 
I simply run the loop and keep the DB out of it.



*using SA*
    with db.engine.begin() as connection:
        for count, statement in enumerate(MEXChunker(mex_file).
yield_merchants()):
            for pc_data in statement.program_charges:
                insert_sql = "INSERT INTO stage.tsys_program_charges 
(reporting_month," \
                             "reporting_year, charge_amount, 
merchant_number, officer_code) " \
                             "VALUES (%s, %s, %s, %s, %s)"
                connection.execute(insert_sql, pc_data)

The above, when ran, shows memory usage climbing through the life of the 
script.  "max rss" tops out at 323984.

I'd like to ultimately be able to use the ORM for this project, but if even 
the simple inserts using SA don't result in constant memory, I can't really 
more forward with that plan.

Thanks in advance for any help you can provide.



*system info*Python 2.7.6
SA 1.0.10 & SA 1.0.12
Ubuntu Linux 14.04

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to