Ok, I've put a session.current.clear() after each iteration in the file and it seems to resolve the problem ! I forgot that once flushed newly peristent objects stay in the session ..
Many thanks, Julien On Thu, 24 May 2007 06:02:33 -0400 Michael Bayer <[EMAIL PROTECTED]> wrote: > > > On May 24, 2007, at 5:52 AM, Julien Cigar wrote: > > > > > Hello, > > > > I have to import, clean and merge data from different sources > > (Excel sheets, Access, text files, etc) into a new database > > (Postgresql 8.1). > > Some of those excel sheets, which have been converted to CSV files, > > are very very big (about 500 000 lines). > > For this new project I decided to use SQLAlchemy to import those > > data (I used it only for the web interface in the past), and it's > > really a big saving of time. > > However, after ~15000 iterations my script consumes about 60% of > > the memory and things become very very slow ... > > For some parts of the script things can be slow, because for each > > line in file A I have to search corresponding line in file B (but > > file B is not so big in this case). > > > > I don't know if this problem is related to SQLAlchemy or not ... > > but I didn't have this problem before. > > > > Here is the script : http://rafb.net/p/RvsH3133.html (it's quite > > huge but the details aren't important) > > > > Does somebody has an idea what could cause this high memory > > consumption ? > > here is the formula: > > ORM (i.e., using mappers/sessions) + > 1000 of anything - *extremely* > careful usage of relationships/cascade/sessions == major memory usage > > the ORM is largely oriented towards keeping itself "correct" as well > as towards being able to load data on demand...therefore if you have > a large recordset that is linked by a relation its very easy for that > whole recordset to get loaded into memory. additionally, SA has to > keep track of a lot of state information for a loaded entity. > > to analyze this program you should be doing things like setting > 'sqlalchemy.engine' to logging.DEBUG so that you can see not just the > queries being issued but also the numbers of rows being loaded in, > and you should also be monitoring the size of your session at various > points, using something like: > > print "session size:", len(session.identity_map), len(session.new), > len(session.dirty) > > as well as looking at objects in memory. you can use a debugger like > PDB for this, though i am lazy and i usually just play with "gc", > doing things like: > > import gc > print len(gc.get_objects()) > > for obj in gc.get_objects(): > if getattr(obj, '__module__', None) is not None and > obj.__module__.startswith('mypackage.name'): > print repr(obj) > > you should try to ensure that the total number of objects present in > the session at any given time is in the hundreds, making liberal use > of clear()/expunge() etc. to keep the size down. > > > -- Julien Cigar <[EMAIL PROTECTED]> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---