Hi, I am building a database on my macbook air computer using python 2.7 and sqlalchemy 0.73.The database has main items which are linked through relationship to multiple other tables with relevant data for the item. I am in the process of data entry - and adding data to each of the tables takes about 0.000176 of a sec - so - for all the simple fields it takes about 0.015471sec per item. The biggest problem is adding a list of citations for each item - since the citations are self referential - hence I need to append them to both the citing and cited items. Adding each citation takes on the average 0.1615 of a sec ! that is 1000x LONGER than each of the other item fields! While adding a citation involves calling up another item from the database - still - this looks excruciatingly slow and is an absolute show stopper for me (data entry will take months!). So - I need to find out what I am doing wrong (and I am sure that this is the case).
My item is defined as: ======================== class PatentInfo(Base): __tablename__ = "pat_info_main" pNum = Column(Integer, primary_key=True) pStatus = Column(String) pTitle = Column(UnicodeText) pLang = Column(String) pProcessInfo = relationship(ProcessDocInfo, primaryjoin = pNum==ProcessDocInfo.pNum, backref="pat_info" ) pParties = relationship(Party, primaryjoin = pNum==Party.pNum, backref="pat_info", ) pEuClass = relationship(EuClass, primaryjoin = pNum==EuClass.pNum, backref="pat_info", ) pUSClass = relationship(USClass, primaryjoin = pNum==USClass.pNum, backref="pat_info" ) pCitingPats = relationship(CitedPatInfo, primaryjoin= pNum==CitedPatInfo.destID, backref = "cited_info" ) pCitedUSPats = relationship(CitedPatInfo, primaryjoin= pNum==CitedPatInfo.srcID, backref = "citing_info" ) pCitedNonUSPats = relationship(CitedIntPatInfo, backref="pat_info_main", primaryjoin = pNum==CitedIntPatInfo.pNum ) pCitedLits = relationship(CitedLitInfo, backref="pat_info_main", primaryjoin = pNum==CitedLitInfo.pNum ) =================== Access to add the fields to the item itself is done through: ==================== self.query_pat = db.session.query(PatentInfo).\ options( joinedload(PatentInfo.pCitingPats) ) p = self.query_pat.get(p_num) if not(p): pat = self.PatentDatToInfo() self.session.add(pat) =================== p = self.query_pat.get(p_num) if not(p): pat = self.PatentDatToInfo() self.session.add(pat) ====== and in the PatentDatToInfo() I add the information to the different columns as : ====== cite = CitedPatInfo({'phase':key, 'info':c}) p.pCitedUSPats.append(cite) if not(str(c.num)) in self.cite_dict: self.cite_dict[str(c.num)] = [] self.cite_dict[str(c.num)].append(cite) ============ Now, when I come to attach the citation to the cited items : ============ def ProcessCitations(self): cited_pats = self.citations.keys() total_cited_pats = len(cited_pats) print 'Total cited pats: %d' % total_cited_pats total_cites = 0 q_pat = self.mainDB.session.query(PatentInfo).\ options(joinedload(PatentInfo.pCitingPats)).\ order_by(PatentInfo.pNum) cit_nums = [int(k) for k in cited_pats] cit_nums.sort() cit_cnt = tick_cnt = 0 for n in cit_nums: total_cites = total_cites + len(self.citations[str(n)]) p1 = q_pat.get(n) if not(p1): p1 = PatentInfo(n) self.session.add(p1) for cite in self.citations[str(n)]: p1.pCitingPats.append(cite) cit_cnt = cit_cnt + 1 if not (cit_cnt % 1000): self.session.commit() ============= The access to the item: p1 = q_pat.get(n) takes around 0.3-0.5 of a second - and that is an absolute killer. Why is it so slow and how do I accelerate it? Thanks, Rivka -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.