Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-20 Thread Steven D'Aprano
Ryan Waples wrote: I count only 19 lines. yep, you are right. My bad, I think I missing copy/pasting line 20. The first group has only three lines. See below. Not so, the first group is actually the first four lines listed below. Lines 1-4 serve as one group. For what it is worth, line

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-20 Thread Steven D'Aprano
Alan Gauld wrote: On 19/07/12 07:00, Steven D'Aprano wrote: for reads, lines in four_lines( INFILE ): ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines Shouldn't that be for reads, lines in enumerate( four_lines(INFILE) ): ID_Line_1,

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Steven D'Aprano
On Wed, Jul 18, 2012 at 04:33:20PM -0700, Ryan Waples wrote: I've included 20 consecutive lines of input and output. Each of these 5 'records' should have been selected and printed to the output file. I count only 19 lines. The first group has only three lines. See below. There is a blank

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Ryan Waples
If you copy those files to a different device (one that has just been scrubbed and reformatted), then copy them back and get different results with your application, you've found your problem. -Bill Thanks for the insistence, I'll check this out. If you have any guidance on how to do

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Ryan Waples
I count only 19 lines. yep, you are right. My bad, I think I missing copy/pasting line 20. The first group has only three lines. See below. Not so, the first group is actually the first four lines listed below. Lines 1-4 serve as one group. For what it is worth, line four should have 1

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Alan Gauld
On 19/07/12 07:00, Steven D'Aprano wrote: def four_lines(file_object): snipping line1 = next(file_object).strip() # Get the next three lines, padding if needed. line2 = next(file_object, '').strip() line3 = next(file_object, '').strip()

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Wayne Werner
Just a few notes... On Wed, 18 Jul 2012, Ryan Waples wrote: snip import glob my_in_files = glob.glob ('E:/PINK/Paired_End/raw/gzip/*.fastq') for each in my_in_files: #print(each) out = each.replace('/gzip', '/rem_clusters2' ) #print (out) INFILE = open (each,

[Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
I'm seeing some unexpected output when I use a script (included at end) to iterate over large text files. I am unsure of the source of the unexpected output and any help would be much appreciated. Background Python v 2.7.1 Windows 7 32bit Reading and writing to an external USB hard drive Data

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Steven D'Aprano
On Wed, Jul 18, 2012 at 04:33:20PM -0700, Ryan Waples wrote: I'm seeing some unexpected output when I use a script (included at end) to iterate over large text files. I am unsure of the source of the unexpected output and any help would be much appreciated. It may help if you can simplify

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Abhishek Pratap
Hi Ryan One quick comment I dint get through all your code to figure out the fine details but my hunch is you might be having issues related to linux to dos EOF char. Could you check the total number of lines in your fastq# are same as read by a simple python file iterator. If not then it is

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread William R. Wing (Bill Wing)
On Jul 18, 2012, at 7:33 PM, Ryan Waples wrote: I'm seeing some unexpected output when I use a script (included at end) to iterate over large text files. I am unsure of the source of the unexpected output and any help would be much appreciated. Background Python v 2.7.1 Windows 7 32bit

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
Thanks for the replies, I'll try to address the questions raised and spur further conversation. those numbers (4GB and 64M lines) look suspiciously close to the file and record pointer limits to a 32-bit file system. Are you sure you aren't bumping into wrap around issues of some sort? My

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread William R. Wing (Bill Wing)
On Jul 18, 2012, at 10:33 PM, Ryan Waples wrote: Thanks for the replies, I'll try to address the questions raised and spur further conversation. those numbers (4GB and 64M lines) look suspiciously close to the file and record pointer limits to a 32-bit file system. Are you sure you aren't

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Lee Harr
  grep ^TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT$ with no results How about: grep TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT outfile Just in case there is some non-printing character in there... Beyond that ... my guess would be that you are either not readingthe file you think you are, or not

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
On Wed, Jul 18, 2012 at 8:04 PM, William R. Wing (Bill Wing) w...@mac.com wrote: On Jul 18, 2012, at 10:33 PM, Ryan Waples wrote: Thanks for the replies, I'll try to address the questions raised and spur further conversation. those numbers (4GB and 64M lines) look suspiciously close to the

Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
On Wed, Jul 18, 2012 at 8:23 PM, Lee Harr miss...@hotmail.com wrote: grep ^TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT$ with no results How about: grep TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT outfile Just in case there is some non-printing character in there... There are many instances of that