On Mon, 14 Jun 2010 07:45:45 am Hs Hs wrote: > hi: > > I have a very large file 15Gb. Starting from 15th line this file > shows the following lines: > > HWUSI-EAS1211_0001:1:1:977:20764#0 > HWUSI-EAS1211_0001:1:1:977:20764#0 > HWUSI-EAS1521_0001:1:1:978:13435#0 > HWUSI-EAS1521_0001:1:1:978:13435#0
It looks to me that your file has a *lot* of redundant information. Does the first part of the line "HWUSI-EAS1211_0001:1:1:" ever change? If it does not, then you can save approximately 65% of the file size by just recording it once, instead of 400 million times: [first 14 lines] prefix = HWUSI-EAS1211_0001:1:1: 977:20764#0 977:20764#0 978:13435#0 978:13435#0 [...] That will bring the file down from 15GB to less than 6GB, and speed up processing time and decrease storage requirements significantly. > Every two lines are part of one readgroup. I want to add two > variables to every line. First variable goes to all lines with odd > numbers. Second variable should be appended to all even number lines. How are these variables calculated? You need some way of automatically calculating them, perhaps by looking them up in a database. I don't know how you calculate them, so I will invent two simple stubs: def suffix1(line): # Calculate the first suffix variable. return " RG:Z:2301" def suffix2(line): # Calculate the first suffix variable. return " RG:Z:2302" > Since I cannot read the entire file, I wanted to cat the file Of course you can read the file, you just can't read it ALL AT ONCE. There's no need to use cat, you just have to read the file line by line and then do something with each line. infile = open("myfile", "r") outfile = open("output.sam", "w") # Skip over the first 15 lines. for i in range(15): infile.next() # Read one line. suffixes = [suffix1, suffix2] # Store the function objects. n = 0 # Process the lines. for line in infile: line = line.strip() # Which suffix function do we want to call? suffix = suffixes[n] outfile.write(line + suffix(line) + '\n') n = 1 - n # n -> 1, 0, 1, 0, 1, 0, ... outfile.close() infile.close() -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor