Jumana yousef wrote: [Please don't reply to the digest. At the very least change the subject to its original text. Thank you.]
> just a reminder of my data: > it cossets of multiple sequences of DNA that I need to count the bases(characters) and calculate the percentage of C+G and calculate the entropy. > before each sequence there is a header or identifier (lets say ID) > so it is like > >ID 1…etc > AAGGTAACCATATATACCGGG….etc (up to or even more than 3000 characters) > >ID 2…etc > AAATTTTTAAATTTTTTAAAATATATATACGCGCGCATGCCCCGGGGG….. etc > … etc > I need the out pu to be like this: > > ID…1.. etc > sequence length = a value > G & G content: a value > Entropy = a value > > ID…2.. etc > sequence length = a value > G & G content: a value > Entropy = a value > ….etc > > > I wrote a program close to what Denis suggested , however it works only if I have one sequence (one header and one sequence), I can not modify it to work if I have several sequences (like above). I also get an incorrect value for entropy (H) > > #!/usr/bin/python If you put the following into a function, say show_stats(seq) > print ' Sequence length : ', len(seq) > counters = {} > for char in seq: > char = char.strip() > if counters.has_key(char): > counters[char] += 1 > else: > counters[char] = 1 > c_g = 100*(counters['C']+counters['G'])/len(seq) > print ' The C & G content: ' '%.1f'% c_g, '%' > import math > all = len(seq) > Pa = (counters['A'])/all > Pc = counters['C']/all > Pg = counters['G']/all > Pt = counters['T']/all > > H =-1*(Pa*math.log(Pa,2) + Pc*math.log(Pc,2) + Pg*math.log(Pg,2) + Pt*math.log(Pt,2)) > > print ' H = ' , H you can invoke that function in and after the while loop like so: > seq = '' > while True: > try: > line = raw_input() > index = line.find('>') > if index > -1: if seq: show_stats(seq) seq = "" > print line > else: > line = line.rstrip() > line = line.upper() > seq = seq + line > except: > break if seq: show_stats() > I do not know why Pa, Pc, Pg, Pt give me a value of 0, although when I type counters['A'] or counters['C']. counters[T'] , counters['G'] or all I get values > 0. When you divide an integer by an integer Python 2 gives you an integer by default: >>> 1/3 0 You can avoid that by converting at least one operand to float >>> float(1)/3 0.3333333333333333 >>> 1/float(3) 0.3333333333333333 or by putting the following magic import at the beginning of every module where you want float or "true" division rather than integer division: >>> from __future__ import division >>> 1/3 0.3333333333333333 > So please how I can fix this calculations and how I modify this program to read each sequence, print the results then read the second one and print the results and so on.. > > Many thanks for your help and support. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor