Ok, I've put a session.current.clear() after each iteration in the file and it 
seems to resolve the problem !
I forgot that once flushed newly peristent objects stay in the session ..

Many thanks,
Julien

On Thu, 24 May 2007 06:02:33 -0400
Michael Bayer <[EMAIL PROTECTED]> wrote:

> 
> 
> On May 24, 2007, at 5:52 AM, Julien Cigar wrote:
> 
> >
> > Hello,
> >
> > I have to import, clean and merge data from different sources  
> > (Excel sheets, Access, text files, etc) into a new database  
> > (Postgresql 8.1).
> > Some of those excel sheets, which have been converted to CSV files,  
> > are very very big (about 500 000 lines).
> > For this new project I decided to use SQLAlchemy to import those  
> > data (I used it only for the web interface in the past), and it's  
> > really a big saving of time.
> > However, after ~15000 iterations my script consumes about 60% of  
> > the memory and things become very very slow ...
> > For some parts of the script things can be slow, because for each  
> > line in file A I have to search corresponding line in file B (but  
> > file B is not so big in this case).
> >
> > I don't know if this problem is related to SQLAlchemy or not ...  
> > but I didn't have this problem before.
> >
> > Here is the script : http://rafb.net/p/RvsH3133.html (it's quite  
> > huge but the details aren't important)
> >
> > Does somebody has an idea what could cause this high memory  
> > consumption ?
> 
> here is the formula:
> 
> ORM (i.e., using mappers/sessions) + > 1000 of anything - *extremely*  
> careful usage of relationships/cascade/sessions == major memory usage
> 
> the ORM is largely oriented towards keeping itself "correct" as well  
> as towards being able to load data on demand...therefore if you have  
> a large recordset that is linked by a relation its very easy for that  
> whole recordset to get loaded into memory.  additionally, SA has to  
> keep track of a lot of state information for a loaded entity.
> 
> to analyze this program you should be doing things like setting  
> 'sqlalchemy.engine' to logging.DEBUG so that you can see not just the  
> queries being issued but also the numbers of rows being loaded in,  
> and you should also be monitoring the size of your session at various  
> points, using something like:
> 
> print "session size:", len(session.identity_map), len(session.new),  
> len(session.dirty)
> 
> as well as looking at objects in memory.  you can use a debugger like  
> PDB for this, though i am lazy and i usually just play with "gc",  
> doing things like:
> 
> import gc
> print len(gc.get_objects())
> 
> for obj in gc.get_objects():
>      if getattr(obj, '__module__', None) is not None and  
> obj.__module__.startswith('mypackage.name'):
>          print repr(obj)
> 
> you should try to ensure that the total number of objects present in  
> the session at any given time is in the hundreds, making liberal use  
> of clear()/expunge() etc. to keep the size down.
> 
> 
> 

-- 
Julien Cigar <[EMAIL PROTECTED]>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to