On Thu, Jan 7, 2010 at 5:08 PM, kumar s <ps_pyt...@yahoo.com> wrote: > I want to take coordiates x and y from each row in file a, and check if they > are in range of zx and zy. If they are in range then I want to be able to > write both matched rows in a tab delim single row. > > > my code: > > f1 = open('fileA','r') > f2 = open('fileB','r') > da = f1.read().split('\n') > dat = da[:-1] > ba = f2.read().split('\n') > bat = ba[:-1] > > > for m in dat: > col = m.split('\t') > for j in bat: > cols = j.split('\t') > if col[1] == cols[1]: > xc = int(cols[2]) > yc = int(cols[3]) > if int(col[2]) in xrange(xc,yc): > if int(col[3]) in xrange(xc,yc): > print m+'\t'+j > > output: > a 4 40811596 40811620 z1 4 + 40810323 40812000 > > > > This code is too slow. Could you experts help me speed the script a lot > faster. > In each file I have over 50K rows and the script runs very slow.
As others have pointed out you are doing way too much work in your inner loob. You should at least preprocess bat so you aren't doing the split and conversion on each line each time through the loop. But the bigger problem is the nested loops themselves, the inner loop will run 2,500,000,000 times which is likely to take a while. To fix this you need to find a faster way to search bat. Interval trees are one way: http://en.wikipedia.org/wiki/Interval_tree http://hackmap.blogspot.com/2008/11/python-interval-tree.html Kent _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor