I'd like to optimize querying and converting a list of Oracle tables into 
pandas dataframes.


The eventual goal is to convert to Parquet, write to disk, then upload to 
S3, but for now I just want to focus on the pandas / sqlalchemy / 
parallelism part. My code sort of works, but it's very slow and seems to 
hang after completing 10 tables.


Any advice for speeding things up or alternative suggestions?


import sqlalchemyfrom sqlalchemy.orm import sessionmaker, scoped_sessionfrom 
multiprocessing.dummy import Pool as ThreadPool from multiprocessing import 
Poolimport pyarrow as paimport pyarrow.parquet as pq
def process_chunk(chunk, table_name, index):
    table = pa.Table.from_pandas(chunk)
    local_file_name = "./" + table_name + "-" + str(index) + ".parquet"
    pq.write_table(table, local_file_name)
def process_table(table):
    db_session = DBSession()
    # helper function that creates the SQL query (select col1, col2, col3, ..., 
colX from table)
    query = setup_query(table)
    i=0
    # is this the right way to use the db_session?
    for chunk in pd.read_sql(query, db_session.bind, chunksize=300000):
        process_chunk(chunk, table, i)
        i+=1

oracle_connect_str = #string_here#
oracle_engine = sqlalchemy.create_engine(
    oracle_connect_str,
    arraysize=10000)
# set up session object to be used by threadsDBSession = scoped_session(
    sessionmaker(
        autoflush=True,
        autocommit=False,
        bind=oracle_engine
    ))

pool = ThreadPool(4)

table_list = ['tbl1','tbl2','tbl3','tbl4','tbl5',...,'tbl20']
# use pool.map instead of creating boiler-plate threading class
pool.map(process_table, table_list)
# are these in the right spots?
pool.close()
pool.join()



Thanks!


-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/0e7bf9a8-0335-4a07-af90-6b217daeb2ea%40googlegroups.com.

Reply via email to