On Thu, May 27, 2010 at 3:37 AM, trias <t.gkikopou...@dundee.ac.uk> wrote:
>
> Hi,
>
>  I have wrote the following lines that work fine, but are quite slow, are
> there any obvious things I can consider to speed things up?
>
>  Thanks
>
> import MySQLdb
>
> import scipy
>
> import csv
>
> dbtest=MySQLdb.connect(host="***",user="***",passwd="***")
>
> cursor=dbtest.cursor()
>
> cursor.execute("""SELECT tfs_name FROM tfs_sites GROUP by tfs_name""")
>
> result=cursor.fetchall()
>
> dbtest.close()
>
> TFname=[]
>
> for row in result:
>
>    TFname.append(row[0])
>
> del result
>
> T={}
>
> i=0
>
> for TF in TFname:
>
>    while i<1:
>
>        dbtest=MySQLdb.connect(host="***",user="***",passwd="***",db="***")
>
>        cursor=dbtest.cursor()
>
>        cursor.execute("""SELECT tfs_chr,tfs_pos,tfs_val FROM tfs_sites
> WHERE tfs_name='%s'"""%(TF))
>
>        result=cursor.fetchall()
>
>        TFchr=[]
>
>        TFpos=[]
>
>        TFval=[]
>
>        i+=1
>
>        for row in result:
>
>            TFchr.append(row[0])
>
>            TFpos.append(row[1])
>
>            TFval.append(row[2])
>
>            TFc=[[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]]
>
>            counter=0
>
>            for TFsite in TFchr:
>
>
> TFc[(int(TFsite)-1)].append((int(TFpos[counter]),int(TFval[counter])))
>
>                T[TF]=TFc
>
>                counter+=1
> --
> View this message in context: 
> http://old.nabble.com/Speed-it-up...-tp28691677p28691677.html
> Sent from the Python - tutor mailing list archive at Nabble.com.
>
> _______________________________________________
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>


Heh, I guess I'll try the "replay after message" plan everyone seems
to like here.


You don't need to make a new connection or cursor every time you run a query.
Just use the cursor you created once.


Secondly, if you can figure out a way to combine all your queries into
one larger query, your speed will increase.  In this case, you're
querying the same table multiple times.  It'd be easier on your
database if you just queried it once.

SELECT tfs_name, tfs_chr, tfs_pos, tfs_val FROM tfs_sites

You'll still get all the information you'd get the other way, and you
only have to hit the DB once.  (This somewhat invalidates the first
point I made, but ...)


If you can't reduce it to that single query (for whatever reason),
since your first query is only grabbing a single column of data, you
should use "distinct" instead of "group by".

SELECT DISTINCT tfs_name FROM tfs_sites




Also, and this will be a FAR SMALLER speedup:

you have several instances where you do this:

big_container = []
for row in some_query_result:
    big_container.append(row[0])

it's faster, and I think clearer/cleaner to do:

big_container = [row[0] for row in some_query_result]

or, if you want to make sure they're all ints:

big_container = [int(row[0]) for row in some_query_result]


TFname, TFchr, TFpos, TFval can all be constructed this way.



There's a few other things you could do that would speed up things,
but the biggest would be to reduce the connection time to your DB, and
reduce the number of queries.



--

I enjoy haiku
but sometimes they don't make sense;
refrigerator?
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to