On Jul 6, 2012, at 6:20 AM, Vathsala wrote:

> Hi,
> 
> I'm trying to load about 4 million odd records from a MSSQL Server table into 
> my memory. The table takes about 1.5GB on my disk but I get a memory error 
> when I load about 850000 records. My RAM is about 32 GB, so I don't 
> understand why it is complaining.
> 
> I first tried loading all the 4 million records, which gave me the 
> MemoryError. Next I filtered the query to load about 850000 records, but it 
> crashes. As far as I understand the entire result set is loaded into the 
> memory before querying for the 850000 records? So, even if it did load it 
> all, I do have enough space left on my RAM for it to process comfortably. 
> Maybe I'm missing something.
> 
> Any ideas or suggestions would be welcome.

4 million rows / 1.5 GB is a tremendous amount of data to pull into memory.   
There is both structural (i.e. things that go along with a piece of data) and 
referential (i.e. the space taken up by pointers to a piece of data) overhead 
added to this data on a per-column as well as per-row level, within the Python 
interpreter and the DBAPI, within SQLAlchemy's row proxy, and if you're using 
the ORM and loading full objects, the mapped objects themselves and the 
associated bookkeeping data will double or triple that overhead as well.    
Assuming you're using pyodbc, I'm not sure if pyodbc buffers rows or not - most 
DBAPIs do, however.  The core ResultProxy does not buffer rows, but then if 
you're in the ORM the Query does load the whole result unless you're using 
yield_per().    I'm also not sure if the Python interpreter itself really 
allows memory to grow that large though I haven't looked into it.

Typically when you're dealing with memory issues, you watch the output of "top" 
as you're running the app, you can check on len(gc.get_objects()), things like 
that.

But in reality, to iterate through 4 million rows I'd usually load them in 
chunks, thereby avoiding any things that aren't controllable with pyodbc, odbc 
drivers, things like that.  With SQL Server I use this recipe: 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery .


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to