Dear group, 

for mapping a string of protein/nucleotides, BLAST is
one tool that is highly sought. However, its
performance is limited due to some factors and one
such factor is length of the query string. IF the
length of the query string is less than 30 characters,
its output is questionable. 

So, what if one has to map a string of 15 character
nucleotide to a jumbo string of characters of length
in millions. 

The simplest solution to this could be string
matching. A researcher from NCI (National Cancer
Institute)
said to map 500 thousand strings of nucleotides(length
of string - 21), they used a hash-table method to map
them on the a chromosome (say 15million length of
string). I do not know what exactly could be a
hash-table method. 

I tried a simplest way of mapping.  Could tutors
comment on this method of mapping. 


# Target string #
a =
'GATGAAGACTTGCAGCGTGGACACTGGCCCAGCCCCGGGTCGCTAAGGAGCTCCGGCAGCTAGGCGCGGAGATGGGGGTGCCCGAACGTCCCACCCTGCTGCTTTTACTCTCCTTGCTACTGATTCCTCTGGGCCTCCCAGTCCTCTGTGCTCCCCCACGCCTCATCTGCGACAGTCGAGTTCTGGAGAGGTACATCTTAGAGGCCAAGGAGGCAGAAAATGTCACGATGGGTTGTGCAGAAGGTCCCAGACTGAGTGAAAATATTACAGTCCCAGATACCAAAGTCAACTTCTATGCTTGGAAAAGAATGGAGGTGGAAGAACAGGCCATAGAAGTTTGGCAAGGCCTGTCCCTGCTCTCAGAAGCCATCCTGCAGGCCCAGGCCCTGCTAGCCAATTCCTCCCAGCCACCAGAGACCCTTCAGCTTCATATAGACAAAGCCATCAGTGGTCTACGTAGCCTCACTTCACTGCTTCGGGTACTGGGAGCTCAGAAGGAATTGATGTCGCCTCCAGATACCACCCCACCTGCTCCACTCCGAACACTCACAGTGGATACTTTCTGCAAGCTCTTCCGGGTCTACGCCAACTTCCTCCGGGGGAAACTGAAGCTGTACACGGGAGAGGTCTGCAGGAGAGGGGACAGGTGACATGCTGCTGCCACCGTGGTGGACCGACGAACTTGCTCCCCGTCACTGTGTCATGCCAACCCTCC'

# small query strings#
q = ['GCAGGAGAGGGGACA', 'GAAGGTCCCAGACTG',
'CCCAGTCCTCTGTGC']

# In the following routine, I sliced the target string
into characters of length 15. I created a dictionary
of sliced target sequence and its coordinates#

dk = []
dv = []
for m in range(len(a)):
        s = m
        e = m+15
        u = m+1
        nd = a[s:e]
        if len(nd)==15:
                x = ('%d:%d')%(u,e)
                dk.append(nd)
                dv.append(x)

sdic = dict(zip(dk,dv))
for r in q:
        if sdic.has_key(r):
                print r+'\t'+sdic[r]

# result Answer:#

GCAGGAGAGGGGACA 631:645
GAAGGTCCCAGACTG 240:254
CCCAGTCCTCTGTGC 137:151


my question is :
Is this a flavor of hash-table method. 
Do you think is there any flaw in this. 
Is there any better method that is possible. 

Thanks
Sri

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to