On Wed, Jul 18, 2012 at 8:04 PM, William R. Wing (Bill Wing) <w...@mac.com> wrote: > On Jul 18, 2012, at 10:33 PM, Ryan Waples wrote: > >> Thanks for the replies, I'll try to address the questions raised and >> spur further conversation. >> >>> "those numbers (4GB and 64M lines) look suspiciously close to the file and >>> record pointer limits to a 32-bit file system. Are you sure you aren't >>> bumping into wrap around issues of some sort?" >> >> My understanding is that I am taking the files in a stream, one line >> at a time and never loading them into memory all at once. I would >> like (and expect) my script to be able to handle files up to at least >> 50GB. If this would cause a problem, let me know. > > [Again, stripping out everything elseā¦] > > I don't think you understood my concern. The issue isn't whether or not the > files are being read as a stream, the issue is that at something like those > numbers a 32-bit file system can silently fail. If the pointers that are > chaining allocation blocks together (or whatever Windows calls them) aren't > capable of indexing to sufficiently large numbers, then you WILL get garbage > included in the file stream. > > If you copy those files to a different device (one that has just been > scrubbed and reformatted), then copy them back and get different results with > your application, you've found your problem. > > -Bill
Thanks for the insistence, I'll check this out. If you have any guidance on how to do so let me know. I knew my system wasn't particularly well suited to the task at hand, but I haven't seen how it would actually cause problems. -Ryan _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor