On Friday 26 January 2007 00:40, Alan Gauld wrote: > "Shadab Sayani" <[EMAIL PROTECTED]> wrote > > > I got your point.But before inserting data I need to store it > > into a file in a format supported by postgresql.Wont this > > operation incur a performance hit as it includes writing > > to a file which is on disk? > > Unless your data is already in a format the database > understands you will have to reformat it before loading it. > There are basically two options: > 1) read the unformatted data piece by piece, reformat > it and load it to the database item by item. > 2) read the unformatted data and write it to an > intermediate file in a format supported by the > database, then load the formatted data in bulk. > > The second approach is nearly always faster than > the first for large data sets. This is due to many things > including transactional overheads in the first approach, > caching issues, availability of bulk optimisations in > the database itself, etc. > > Writing to a flat file is much faster than writing to a > database. Reformatting data is a complex business. > Python is good at complex processing and writing > to flat files. SQL is good at writing to databases but poor > at complex processing. So use Python for its > strengths and SQL for its strengths and you get > optimal results. > > HTH, > > Alan G
This subject has come up several times in the psycopg mailing list. The author of psycopg (Federico Di Gregorio) has suggested the best to handle this is as follows: "When I'd have some more time I'll write a class to do that without the need to create a temporary file. Now that I think about it probably the best way would be to use two threads writing/reading from a pipe, so that the postgresql one doesn't have to wait for the other and data is always ready." In response too: This is a copout and uses temporary files, but does the trick: Usage is pretty simple: orac = ora.cursor() orac.execute('select * from gene.geneinfo') orac.arraysize=100000 oraf = OracleFile2(orac) pgc = pg.cursor() pgc.copy_from(oraf,'loader.gene_info') ------------------------------------------------ class CursorFile(object): def __init__(self,cursor): self._cursor = cursor self._file = tempfile.TemporaryFile() self.write_file() def write_file(self): print 'fetching' dat = self._cursor.fetchmany() while len(dat)>0: print 'writing' for row in dat: self._file.writelines("\t".join(map(str,row)) + "\n") print "fetching" dat = self._cursor.fetchmany() print 'done writing' self._file.flush() self._file.seek(0) def read(self,size=-1): dat = self._file.read(size) return dat def readline(self,size=-1): return self._file.readline(size) I hope this helps -- John Fabiani _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor