Dear group, for mapping a string of protein/nucleotides, BLAST is one tool that is highly sought. However, its performance is limited due to some factors and one such factor is length of the query string. IF the length of the query string is less than 30 characters, its output is questionable.
So, what if one has to map a string of 15 character nucleotide to a jumbo string of characters of length in millions. The simplest solution to this could be string matching. A researcher from NCI (National Cancer Institute) said to map 500 thousand strings of nucleotides(length of string - 21), they used a hash-table method to map them on the a chromosome (say 15million length of string). I do not know what exactly could be a hash-table method. I tried a simplest way of mapping. Could tutors comment on this method of mapping. # Target string # a = 'GATGAAGACTTGCAGCGTGGACACTGGCCCAGCCCCGGGTCGCTAAGGAGCTCCGGCAGCTAGGCGCGGAGATGGGGGTGCCCGAACGTCCCACCCTGCTGCTTTTACTCTCCTTGCTACTGATTCCTCTGGGCCTCCCAGTCCTCTGTGCTCCCCCACGCCTCATCTGCGACAGTCGAGTTCTGGAGAGGTACATCTTAGAGGCCAAGGAGGCAGAAAATGTCACGATGGGTTGTGCAGAAGGTCCCAGACTGAGTGAAAATATTACAGTCCCAGATACCAAAGTCAACTTCTATGCTTGGAAAAGAATGGAGGTGGAAGAACAGGCCATAGAAGTTTGGCAAGGCCTGTCCCTGCTCTCAGAAGCCATCCTGCAGGCCCAGGCCCTGCTAGCCAATTCCTCCCAGCCACCAGAGACCCTTCAGCTTCATATAGACAAAGCCATCAGTGGTCTACGTAGCCTCACTTCACTGCTTCGGGTACTGGGAGCTCAGAAGGAATTGATGTCGCCTCCAGATACCACCCCACCTGCTCCACTCCGAACACTCACAGTGGATACTTTCTGCAAGCTCTTCCGGGTCTACGCCAACTTCCTCCGGGGGAAACTGAAGCTGTACACGGGAGAGGTCTGCAGGAGAGGGGACAGGTGACATGCTGCTGCCACCGTGGTGGACCGACGAACTTGCTCCCCGTCACTGTGTCATGCCAACCCTCC' # small query strings# q = ['GCAGGAGAGGGGACA', 'GAAGGTCCCAGACTG', 'CCCAGTCCTCTGTGC'] # In the following routine, I sliced the target string into characters of length 15. I created a dictionary of sliced target sequence and its coordinates# dk = [] dv = [] for m in range(len(a)): s = m e = m+15 u = m+1 nd = a[s:e] if len(nd)==15: x = ('%d:%d')%(u,e) dk.append(nd) dv.append(x) sdic = dict(zip(dk,dv)) for r in q: if sdic.has_key(r): print r+'\t'+sdic[r] # result Answer:# GCAGGAGAGGGGACA 631:645 GAAGGTCCCAGACTG 240:254 CCCAGTCCTCTGTGC 137:151 my question is : Is this a flavor of hash-table method. Do you think is there any flaw in this. Is there any better method that is possible. Thanks Sri __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor