On Fri, Mar 21, 2014 at 08:31:07PM +1100, Mustafa Musameh wrote: > Please help. I have been search the internet to understand how to > write a simple program/script with python, and I did not do anything. > I have a file that look like this > >ID 1 > agtcgtacgt… > >ID 2 > attttaaaaggggcccttcc > . > . > . > in other words, it contains several IDs each one has a sequence of 'acgt' > letters > I need to write a script in python where the output will be, for example, > like this > > ID 1 > a = 10%, c = 40%, g=40%, t = 10% > >ID 2 > a = 15%, c = 35%, g=35%, t = 15% > . > . > . > (i mean the first line is the ID and the second line is the frequency of each > letter ) > How I can tell python to print the first line as it is and count > characters starting from the second line till the beginning of the > next '>' and so on
This sounds like a homework exercise, and I have a policy of trying not to do homework for people. But I will show you the features you need. Firstly, explain what you would do if you were solving this problem in your head. Write out the steps in English (or the language of your choice). Don't worry about writing code yet, you're writing instructions for a human being at this stage. Those instructions might look like this: Open a file (which file?). Read two lines at a time. The first line will look like ">ID 42". Print that line unchanged. The second line will look line "gatacacagtatta...". Count how many "g", "a", "t", "c" letters there are, then print the results as percentages. Stop when there are no more lines to be read. Now that you know what needs to be done, you can start using Python for it. Start off by opening a file for reading: f = open("some file") There are lots of ways to read the file one line at a time. Here's one way: for line in f: print(line) But you want to read it *two* lines at a time. Here is one way: for first_line in f: second_line = next(f, '') print(first_line) print(second_line) Here's another way: first_line = None while first_line != '': first_line = f.readline() second_line = f.readline() Now that you have the lines, what do you do with them? Printing the first line is easy. How about the second? second_line = "gatacattgacaaccggaataccgagta" Now you need to do four things: - count the total number of characters, ignoring the newline at the end - count the number of g, a, t, c characters individually - work out the percentages of the total - print each character and its percentage Here is one way to count the total number of characters: count = 0 for c in second_line: count += 1 Can you think of a better way? Do you think that maybe Python has a built-in command to calculate the length of a string? Here is one way to count the number of 'g' characters: count_of_g = 0 for c in second_line: count_of_g += 1 (Does this look familiar?) Can you think of another way to count characters? Hint: strings have a count method: py> s = "fjejevffveejf" py> s.count("j") 3 Now you need to calculate the percentages. Do you know how to calculate the percentage of a total? Hint: you'll need to divide two numbers and multiply by 100. Finally, you need to print the results. Putting all these parts together should give you a solution. Good luck! Write as much code as you can, and come back with any specific questions you may have. -- Steven _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor