[sqlalchemy] Extremely slow access to data base

rivka Sat, 17 Dec 2011 12:00:00 -0800

Hi,

I am building a database on my macbook air computer using python 2.7
and sqlalchemy 0.73.The database has main items which are linked
through relationship to multiple other tables with relevant data for
the item. I am in the process of data entry - and adding data to each
of the tables takes about 0.000176 of a sec - so - for all the simple
fields it takes about 0.015471sec per item. The biggest problem is
adding a list of citations for each item - since the citations are
self referential - hence I need to append them to both the citing and
cited items.
Adding each citation takes on the average 0.1615 of a sec ! that is
1000x LONGER than each of the other item fields! While adding a
citation involves calling up another item from the database - still -
this looks excruciatingly slow and is an absolute show stopper for me
(data entry will take months!). So - I need to find out what I am
doing wrong (and I am sure that this is the case).


My item is defined as:
========================
class PatentInfo(Base):

    __tablename__ = "pat_info_main"
    pNum = Column(Integer, primary_key=True)
    pStatus = Column(String)
    pTitle = Column(UnicodeText)
    pLang = Column(String)

    pProcessInfo = relationship(ProcessDocInfo,
                                primaryjoin =
pNum==ProcessDocInfo.pNum,
                                backref="pat_info"
                                )
    pParties = relationship(Party,
                                primaryjoin = pNum==Party.pNum,
                                backref="pat_info",
                                )
    pEuClass = relationship(EuClass,
                                primaryjoin = pNum==EuClass.pNum,
                                backref="pat_info",
                                )
    pUSClass = relationship(USClass,
                                primaryjoin = pNum==USClass.pNum,
                                backref="pat_info"
                                )
    pCitingPats = relationship(CitedPatInfo,
                         primaryjoin= pNum==CitedPatInfo.destID,
                         backref = "cited_info"
                         )
    pCitedUSPats = relationship(CitedPatInfo,
                         primaryjoin= pNum==CitedPatInfo.srcID,
                         backref = "citing_info"
                         )
    pCitedNonUSPats = relationship(CitedIntPatInfo,
backref="pat_info_main",
                                primaryjoin =
pNum==CitedIntPatInfo.pNum
                                )
    pCitedLits = relationship(CitedLitInfo, backref="pat_info_main",
                                primaryjoin = pNum==CitedLitInfo.pNum
                                )
===================
Access to add the fields to the item itself is done through:
====================
                self.query_pat = db.session.query(PatentInfo).\
                        options( joinedload(PatentInfo.pCitingPats) )

                p = self.query_pat.get(p_num)
                if not(p):
                    pat = self.PatentDatToInfo()
                    self.session.add(pat)

===================

                p = self.query_pat.get(p_num)
                if not(p):
                    pat = self.PatentDatToInfo()
                    self.session.add(pat)

======
and in the PatentDatToInfo() I add the information to the different
columns as :
======
                        cite = CitedPatInfo({'phase':key, 'info':c})
                        p.pCitedUSPats.append(cite)
                        if not(str(c.num)) in self.cite_dict:
                            self.cite_dict[str(c.num)] = []
                        self.cite_dict[str(c.num)].append(cite)
============
Now, when I come to attach the citation to the cited items :
============
    def ProcessCitations(self):
        cited_pats = self.citations.keys()
        total_cited_pats = len(cited_pats)
        print 'Total cited pats: %d' % total_cited_pats
        total_cites = 0

        q_pat = self.mainDB.session.query(PatentInfo).\
                        options(joinedload(PatentInfo.pCitingPats)).\
                        order_by(PatentInfo.pNum)

        cit_nums = [int(k) for k in cited_pats]
        cit_nums.sort()
        cit_cnt = tick_cnt = 0

        for n in cit_nums:
            total_cites = total_cites + len(self.citations[str(n)])
            p1 = q_pat.get(n)
            if not(p1):
                p1 = PatentInfo(n)
                self.session.add(p1)
            for cite in self.citations[str(n)]:
                p1.pCitingPats.append(cite)
            cit_cnt = cit_cnt + 1
            if not (cit_cnt % 1000):
                self.session.commit()

=============
The access to the item: p1 = q_pat.get(n) takes around 0.3-0.5 of a
second - and that is an absolute killer. Why is it so slow and how do
I accelerate it?

Thanks,

Rivka

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Extremely slow access to data base

Reply via email to